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ISOPRENOID PRODUCTION 
BACKGROUND 

L Technical Field 

5 The invention relates to methods and materials involved in the production of 

isoprenoids. 

2. Background Information 

Isoprenoids are compounds that have at least one five-carbon isoprenoid unit. 

1 0 Examples of isoprenoid compounds include, without limitation, carotenoids, isoprenes, 
sterols, terpenes, and ubiquinones. Various enzymatic pathways in plants, animals, and 
microorganisms result in the synthesis of isoprenoid compounds. Typically, isopentenyl 
diphosphate (IPP), dimethylallyl diphosphate (DMAPP), or combinations thereof are 
polymerized to form isoprenoid compounds. 

1 5 Two pathways can be used to produce IPP. The first pathway, known as the 

mevalonate-dependent pathway, produces IPP from 3-hydroxymethyl-3-methylglutaryl 
Coenzyme A (HMGCoA) in a series of reactions. The second pathway, known as the 
mevalonate-independent pathway, produces IPP from l-deoxyxylulose-5-phosphate 
(DXP) in a series of reactions. One of those reactions involves the use of DXP synthase 

20 (DXS) to catalyze the condensation of pyruvate and glyceraldehyde-3 -phosphate to form 
DXP. 

Once made, IPP can be used to make various isoprenoid compounds. 
Specifically, enzymes known as polyprenyl diphosphate synthases catalyze 
polymerization reactions that combine IPP and DMAPP to form compounds known as 

25 polyprenyl diphosphates. For example, decaprenyl diphosphate synthase (DDS) catalyzes 
the consecutive condensation of IPP with allylic diphosphates to produce decaprenyl 
diphosphate. Decaprenyl diphosphate is a polyprenyl diphosphate that can be used to 
form the side chain of a ubiquinone known as CoQ(lO). Other polyprenyl diphosphate 
synthases include, without limitation, farnesyl-, geranyl-, and octapreneyl diphosphate 

30 synthases. 
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SUMMARY 

The invention relates to methods and materials involved in the production of 
isoprenoid compounds. Specifically, the invention provides nucleic acid molecules, 
polypeptides, host cells, and methods that can be used to produce isoprenoid compounds. 
5 Isoprenoid compounds are both biologically and commercially important. For example, 
the nutritional industry uses isoprenoid compounds as nutritional supplements, while the 
perfume industry uses isoprenoid compounds as fragrances. The nucleic acid molecules 
described herein can be used to engineer host cells having the ability to produce particular 
isoprenoid compounds. The polypeptides described herein can be used in cell-free 

10 systems to make particular isoprenoid compounds. The host cells described herein can be 
used in culture systems to produce large quantities of particular isoprenoid compounds. 

In general, the invention features an isolated nucleic acid containing a nucleic acid 
sequence having a length and a percent identity to the sequence set forth in SEQ ID NO: 1 
over the length, wherein the point defined by the length and the percent identity is within 

15 the area defined by points A, B, C, and D of Figure 26, wherein point A has coordinates 
(3626, 100), point B has coordinates (3626, 65), point C has coordinates (50, 65), and 
point D has coordinates (12, 100). The point B can have coordinates (3626, 85). The 
point C can have coordinates (100, 65). The point C can have coordinates (50, 85). The 
point D can have coordinates (15, 100). The nucleic acid sequence can encode a 

20 polypeptide. The polypeptide can have DXS activity. The nucleic acid sequence can be 
as set forth in SEQ ID NO:l . 

In one embodiment, the invention features an isolated nucleic acid containing a 
nucleic acid sequence having a length and a percent identity to the sequence set forth in 
SEQ ID NO:2 over the length, wherein the point defined by the length and the percent 

25 identity is within the area defined by points A, B, C, and D of Figure 26, wherein point A 
has coordinates (1926, 100), point B has coordinates (1926, 65), point C has coordinates 
(50, 65), and point D has coordinates (12, 100). The nucleic acid sequence can encode a 
polypeptide. The polypeptide can have DXS activity. 

In another embodiment, the invention features an isolated nucleic acid containing 

30 a nucleic acid sequence, wherein the nucleic acid sequence encodes a polypeptide 

containing an amino acid sequence, wherein the amino acid sequence has a length and a 
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percent identity to the sequence set forth in SEQ ID NO:3 over the length, wherein the 
point defined by the length and the percent identity is within the area defined by points A, 
B, C, and D of Figure 26, wherein point A has coordinates (641, 100), point B has 
coordinates (641, 65), point C has coordinates (25, 65), and point D has coordinates (5, 

5 100). The polypeptide can have DXS activity. 

Another embodiment of the invention features an isolated nucleic acid containing 
a nucleic acid sequence having a length and a percent identity to the sequence set forth in 
SEQ ID NO:37 over the length, wherein the point defined by the length and the percent 
identity is within the area defined by points A, B, C, and D of Figure 26, wherein point A 

10 has coordinates (1990, 100), point B has coordinates (1990, 65), point C has coordinates 
(50, 65), and point D has coordinates (16, 100). The point B can have coordinates (1990, 
85). The point C can have coordinates (100, 55). The point C can have coordinates (50, 
85). The point D can have coordinates (20, 100). The nucleic acid sequence can encode 
a polypeptide. The polypeptide can have DDS activity. The nucleic acid sequence can be 

1 5 as set forth in SEQ ID NO:37. 

Another embodiment of the invention features an isolated nucleic acid containing 
a nucleic acid sequence having a length and a percent identity to the sequence set forth in 
SEQ ID NO:38 over the length, wherein the point defined by the length and the percent 
identity is within the area defined by points A, B, C, and D of Figure 26, wherein point A 

20 has coordinates (1002, 100), point B has coordinates (1002, 65), point C has coordinates 
(50, 65), and point D has coordinates (16, 100). The nucleic acid sequence can encode a 
polypeptide. The polypeptide can have DDS activity. 

Another embodiment of the invention features an isolated nucleic acid containing 
a nucleic acid sequence, wherein the nucleic acid sequence encodes a polypeptide 

25 containing an amino acid sequence, wherein the amino acid sequence has a length and a 
percent identity to the sequence set forth in SEQ ID NO:39 over the length, wherein the 
point defined by the length and the percent identity is within the area defined by points A, 
B, C, and D of Figure 26, wherein point A has coordinates (333, 100), point B has 
coordinates (333, 65), point C has coordinates (25, 65), and point D has coordinates (5, 

30 100). The polypeptide can have DDS activity. 

Another embodiment of the invention features an isolated nucleic acid containing 
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a nucleic acid sequence having a length and a percent identity to the sequence set forth in 
SEQ ID NO:40 over the length, wherein the point defined by the length and the percent 
identity is within the area defined by points A, B, C, and D of Figure 26, wherein point A 
has coordinates (1833, 100), point B has coordinates (1833, 65), point C has coordinates 
5 (50, 65), and point D has coordinates (16, 100). The point B can have coordinates (1833, 
85). The point C can have coordinates (100, 65). The point C can have coordinates (50, 
85). The point D can have coordinates (20, 100). The nucleic acid sequence can encode 
a polypeptide. Hie polypeptide can have DDS activity. The nucleic acid sequence can be 
as set forth in SEQ ID NO:40. 

1 0 Another embodiment of the invention features an isolated nucleic acid containing 

a nucleic acid sequence having a length and a percent identity to the sequence set forth in 
SEQ ID NO:41 over the length, wherein the point defined by the length and the percent 
identity is within the area defined by points A, B, C, and D of Figure 26, wherein point A 
has coordinates (1014, 100), point B has coordinates (1014, 65), point C has coordinates 

1 5 (50, 65), and point D has coordinates (16, 100). The nucleic acid sequence can encode a 
polypeptide. The polypeptide can have DDS activity. 

Another embodiment of the invention features an isolated nucleic acid containing 
a nucleic acid sequence, wherein the nucleic acid sequence encodes a polypeptide 
containing an amino acid sequence, wherein the amino acid sequence has a length and a 

20 percent identity to the sequence set forth in SEQ ID NO:42 over the length, wherein the 
point defined by the length and the percent identity is within the area defined by points A, 
B, C, and D of Figure 26, wherein point A has coordinates (337, 100), point B has 
coordinates (337, 65), point C has coordinates (25, 65), and point D has coordinates (5, 
100). The polypeptide can have DDS activity, 

25 Another embodiment of the invention features an isolated nucleic acid containing 

a nucleic acid sequence having a length and a percent identity to the sequence set forth in 
SEQ ID NO:95 over the length, wherein the point defined by the length and the percent 
identity is within the area defined by points A, B, C, and D of Figure 26, wherein point A 
has coordinates (2017, 100), point B has coordinates (2017, 65), point C has coordinates 

30 (50, 65), and point D has coordinates (16, 1 00). The point B can have coordinates (2017, 
85). The point C can have coordinates (100, 65). The point C can have coordinates (50, 
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85). The point D can have coordinates (20, 100). The nucleic acid sequence can encode 
a polypeptide. The polypeptide can have DXR activity. The nucleic acid sequence can 
be as set forth in SEQ ID NO:95. 

Another embodiment of the invention features an isolated nucleic acid containing 
5 a nucleic acid sequence having a length and a percent identity to the sequence set forth in 
SEQ ID NO:96 over the length, wherein the point defined by the length and the percent 
identity is within the area defined by points A, B, C, and D of Figure 26, wherein point A 
has coordinates (1161, 100), point B has coordinates (1161, 65), point C has coordinates 
(50, 65), and point D has coordinates (16, 100). The nucleic acid sequence can encode a 

1 0 polypeptide. The polypeptide can have DXR activity. 

Another embodiment of the invention features an isolated nucleic acid containing 
a nucleic acid sequence, wherein the nucleic acid sequence encodes a polypeptide 
containing an amino acid sequence, wherein the amino acid sequence has a length and a 
percent identity to the sequence set forth in SEQ ID NO:97 over the length, wherein the 

1 5 point defined by the length and the percent identity is within the area defined by points A, 
B, C, and D of Figure 26, wherein point A has coordinates (386, 100), point B has 
coordinates (386, 65), point C has coordinates (25, 65), and point D has coordinates (5, 
100). The polypeptide can have DXR activity. 

Another embodiment of the invention features an isolated nucleic acid containing 

20 a nucleic acid sequence of at least 12 nucleotides, wherein the isolated nucleic acid 
hybridizes under hybridization conditions to the sense or antisense strand of a nucleic 
acid molecule, the sequence of the nucleic acid molecule being the sequence set forth in 
SEQ ID NO: 1, 2, 37, 38, 40, 41, 95, or 96. The nucleic acid sequence can be at least 50 
nucleotides (e.g., at least 100, 200, 300, 400, 500, or more). The nucleic acid sequence 

25 can encode a polypeptide. The polypeptide can have DXS, DDS, or DXR activity. 

In another aspect, the invention features a substantially pure polypeptide 
containing an amino acid sequence, wherein the amino acid sequence has a length and a 
percent identity to the sequence set forth in SEQ ED NO: 3 over the length, wherein the 
point defined by the length and the percent identity is within the area defined by points A, 

30 B, C, and D of Figure 26, wherein point A has coordinates (641, 100), point B has 

coordinates (641, 65), point C has coordinates (25, 65), and point D has coordinates (5, 
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100). The polypeptide can have DXS activity. 

In another embodiment* the invention features a substantially pure polypeptide 
containing an amino acid sequence, wherein the amino acid sequence has a length and a 
percent identity to the sequence set forth in SEQ ID NO:39 over the length, wherein the 
5 point defined by the length and the percent identity is within the area defined by points A, 
B, C, and D of Figure 26, wherein point A has coordinates (333, 100), point B has 
coordinates (333, 65), point C has coordinates (25, 65), and point D has coordinates (5, 
100). The polypeptide can have DDS activity. 

Another embodiment of the invention features a substantially pure polypeptide 

10 containing an amino acid sequence, wherein the amino acid sequence has a length and a 
percent identity to the sequence set forth in SEQ ID NO:42 over the length, wherein the 
point defined by the length and the percent identity is within the area defined by points A, 
B, C, and D of Figure 26, wherein point A has coordinates (337, 100), point B has 
coordinates (337, 65), point C has coordinates (25, 65), and point D has coordinates (5, 

15 1 00). The polypeptide can have DDS activity. 

Another embodiment of the invention features a substantially pure polypeptide 
containing an amino acid sequence, wherein the amino acid sequence has a length and a 
percent identity to the sequence set forth in SEQ ID NO:97 over the length, wherein the 
point defined by the length and the percent identity is within the area defined by points A, 

20 B, C, and D of Figure 26, wherein point A has coordinates (386, 100), point B has 

coordinates (386, 65), point C has coordinates (25, 65), and point D has coordinates (5, 
100). The polypeptide can have DXR activity. 

Another aspect of the invention features a host cell containing an isolated nucleic 
acid of claim 1, 9, 12, 14, 22, 25, 27, 35, 38, 40, 48, 51, or 53. The host cell can be 

25 prokaryotic. The host cell can be a Rhodobacter, Sphingomonas, or Escherichia cell. 
The host cell can contain an exogenous nucleic acid that encodes a polypeptide having 
DDS, DXS, ODS, SDS, DXR, 4-diphosphocytidyl.2C-methyl-D-erythritol synthase, 4- 
diphosphocytidyl-2C-methyl«D-erythritol kinase, or chorismate lyase activity. The host 
cell can contain an exogenous nucleic acid containing an UbiC sequence or LytB 

30 sequence. The host cell can contain an exogenous nucleic acid containing an UbiC 

sequence and LytB sequence. The host cell can contain a non-functional crtE sequence, 
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ppsR sequence, or ccoN sequence. The host cell can contain a non-functional crtE 
sequence, ppsR sequence, and ccoN sequence. 

Another embodiment of the invention features a host cell containing an exogenous 
nucleic acid and a non-functional crtE sequence, ppsR sequence, or ccoN sequence, 

5 wherein the exogenous nucleic acid is within a crtE, ppsR, or ccoN locus of the host cell. 

Another embodiment of the invention features a host cell containing a genomic 
deletion, wherein the deletion comprises at least a portion of a crtE sequence, ppsR 
sequence, or ccoN sequence, and wherein the host cell comprises a non-functional crtE 
sequence, ppsR sequence, or ccoN sequence. 

1 0 Another aspect of the invention features a method for increasing production of 

CoQ(lO) in a cell having endogenous DDS activity. The method includes inserting a 
nucleic acid molecule containing a nucleic acid sequence that encodes a polypeptide 
having DDS activity into the cell such that production of CoQ(lO) is increased. The 
nucleic acid molecule can contain an isolated nucleic acid of claim 14, 22, 25, 27, 35, 38, 

1 5 or 53 . The production of CoQ(l 0) can be increased at least about 5 percent as compared 
to a control cell lacking the inserted nucleic acid molecule. The cell can be a 
Rhodobacter or Sphingomonas cell. The cell can be a membraneous bacterium or highly 
membraneous bacterium. The method can also include inserting a second nucleic acid 
molecule containing a nucleotide sequence that encodes a polypeptide having DXS 

20 activity into the cell. The second nucleic acid molecule can contain an isolated nucleic 
acid of claim 1, 9, or 12. 

In another embodiment, the invention features a method for increasing production 
of CoQ(lO) in a cell having endogenous DDS activity. The method includes inserting a 
nucleic acid molecule containing a nucleic acid sequence that encodes a polypeptide 

25 having DXS activity into the cell such that production of CoQ(lO) is increased. The 

production of CoQ(lO) can be increased at least about 5 percent as compared to a control 
cell lacking the inserted nucleic acid molecule. The cell can be a Rhodobacter or 
Sphingomonas cell. The nucleic acid molecule can contain an isolated nucleic acid of 
claim 1, 9, or 12. The cell can be a membraneous bacterium or highly membraneous 

30 bacterium. The method can also include inserting a second nucleic acid molecule 

containing a nucleotide sequence that encodes a polypeptide having DDS activity into the 
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cell. The second nucleic acid molecule can contain an isolated nucleic acid of claim 14, 

22, 25,27,35,38, or 53. 

Another embodiment of the invention features a method for increasing production 

of CoQ(lO) in a membraneous bacterium. The method includes inserting a nucleic acid 
5 molecule containing a nucleic acid sequence that encodes a polypeptide having DDS 

activity into the bacterium such that production of CoQ(lO) is increased. 

Another embodiment of the invention features a method for increasing production 

of CoQ(lO) in a highly membraneous bacterium. The method includes inserting a nucleic 

acid molecule containing a nucleic acid sequence that encodes a polypeptide having DDS 
1 0 activity into the highly membraneous bacterium such that production of CoQ(l 0) is 

increased. 

Another embodiment of the invention features a method for making an isoprenoid. 
The method includes culturing a cell under conditions wherein the cell produces the 
isoprenoid, wherein the cell contains at least one exogenous nucleic acid that encodes at 

1 5 least one polypeptide, wherein the cell produces more of the isoprenoid than a 

comparable cell lacking the at least one exogenous nucleic acid. The cell can be a 
Rhodobacter or Sphingomonas cell. The isoprenoid can be CoQ(lO). The at least one 
polypeptide can have DDS, DXS, ODS, SDS, DXR, 4-diphosphocytidyl-2C-methyl-D- 
erythritol synthase, 4-diphosphocytidyl-2C-methyl»D-erythritol kinase, or chorismate 

20 lyase activity. The at least one polypeptide can be a UbiC polypeptide or a LytB 

polypeptide. The cell can contain a non-functional crtE sequence, ppsR sequence, or 
ccoN sequence. The cell can contain a non-functional crtE sequence, ppsR sequence, and 
ccoN sequence. The cell can contain a genomic deletion, wherein the deletion contains at 
least a portion of a crtE sequence, ppsR sequence, or ccoN sequence, and wherein the cell 

25 contains a non-functional crtE sequence, ppsR sequence, or ccoN sequence. 

Another embodiment of the invention features a method for making an isoprenoid. 
The method includes culturing a genetically modified cell under conditions wherein the 
cell produces the isoprenoid. The isoprenoid can be CoQ( 10). The cell can contain an 
exogenous nucleic acid. The cell can contain a genomic deletion. 

30 Unless otherwise defined, all technical and scientific terms used herein have the 

same meaning as commonly understood by one of ordinary skill in the art to which this 
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invention pertains. Although methods and materials similar or equivalent to those 
described herein can be used in the practice or testing of the present invention, suitable 
methods and materials are described below. All publications, patent applications, patents, 
and other references mentioned herein are incorporated by reference in their entirety. In 
5 case of conflict, the present specification, including definitions, will control. In addition, 
the materials, methods, and examples are illustrative only and not intended to be limiting. 

Other features and advantages of the invention will be apparent from the 
following detailed description, and from the claims. 

1 o DESCRIPTION OF DRAWINGS 

Figure 1 is a diagram of a pathway for producing CoQ(lO). 

Figure 2 is a listing of a nucleic acid sequence that encodes a Sphingomonas 
trueperi (ATCC 12417) polypeptide having DXS activity (SEQ ID NO:l). The start 
codon is the ATG at nucleotide number 1 82, and the stop codon is the TAA at nucleotide 
15 number 2107. The probable ribosome binding site is at nucleotide numbers 175-178. 
This sequence contains an open reading frame as well as 5 5 and 3' untranslated 
sequences. 

Figure 3 is a listing of a nucleic acid sequence that encodes a Sphingomonas 
trueperi (ATCC 12417) polypeptide having DXS activity (SEQ ID NO:2). This sequence 
20 corresponds to the open reading frame. 

Figure 4 is a listing of an amino acid sequence of a Sphingomonas trueperi 
(ATCC 12417) polypeptide having DXS activity (SEQ ID NO:3). 

Figure 5 is a sequence pile-up of 14 nucleic acid sequences that encode 
polypeptides having DXS activity. STdxsdna represents the nucleic acid sequence set 
25 forth in SEQ ID NO:2; CRdxsdna represents a nucleic acid sequence from 

Chlamydomonas reinhardtii (GenBank accession number AJ007559; SEQ ID NO:4); 
CJdxsdna represents a nucleic acid sequence from Campylobacter jejuni (GenBank 
accession number AL139074; SEQ ID NO:5); PAdxsdna represents a nucleic acid 
sequence from Pseudomonas aeruginosa (GenBank accession number AE004821 ; SEQ 
30 ID NO:6); LEdxsdna represents a nucleic acid sequence from Lycopersicon esculentum 
(GenBank accession number AF143812; SEQ ID NO:7); MTdxsdna represents a nucleic 
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acid sequence from Mycobacterium tuberculosis (GenBank accession number Z96072; ; 
SEQ ID NO: 8); RSdxsldna represents a nucleic acid sequence from a Rhodobacter 
sphaeroides dxsl gene (SEQ ID NO:9); RSdxs2dna represents a nucleic acid sequence 
from a Rhodobacter sphaeroides dxs2 gene (SEQ ID NO: 10); SPCCdxsdna represents a 
5 nucleic acid sequence from Synechococcus PCC6301 (GenBank accession number 

Y18874; SEQ ID NO:l 1); ECdxsdna represents a nucleic acid sequence from Escherichia 
coli (GenBank accession number AF035440; SEQ ID NO: 12); NMdxsdna represents a 
nucleic acid sequence from Neisseria meningitidis (GenBank accession number 
AL162753; SEQ ID NO: 13); HIdxsdna represents a nucleic acid sequence from 
1 0 Haemophilus influenza (GenBank accession number U32822; SEQ ID NO: 14); SSdxsdna 
represents a nucleic acid sequence from Streptomyces sp. CL190 (GenBank accession 
number AB026631; SEQ ID NO:16); and HPdxsdna represents a nucleic acid sequence 
from Helicobacter pylori 26695 (GenBank accession number AE000552; SEQ ID 
NO: 17). 

15 Figure 6 is a sequence pile-up of 21 amino acid sequences of polypeptides having 

DXS activity. STdxsp represents an amino acid sequence set forth in SEQ ID NO:3; 
AAdxsp represents an amino acid sequence from Aquifex aeolicus (GenBank accession 
number 067036; SEQ ID NO:18); BSdxsp represents an amino acid sequence from 
Bacillus subtilis (GenBank accession number P54523; SEQ ID NO:19); CRdxsp 

20 represents an amino acid sequence from Chlamydomonas reinhardtii (GenBank accession 
number CAA07554; SEQ ID NO:20); CJdxsp represents an amino acid sequence from 
Campylobacter jejuni (GenBank accession number CAB72788; SEQ ID NO:21); PAdxsp 
represents an amino acid sequence from Pseudomonas aeruginosa (GenBank accession 
number AAG07431; SEQ ID NO: 15); LEdxsp represents an amino acid sequence from 

25 Ly coper sicon esculentum (GenBank accession number AAD38941 ; SEQ ID NO:22); 
MLdxsp represents an amino acid sequence from Mycobacterium leprae (GenBank 
accession number Q50000; SEQ ID NO:23); MTdxsp represents an amino acid sequence 
from Mycobacterium tuberculosis (GenBank accession number CAB09493; SEQ ID 
NO:24); RCdxsp represents an amino acid sequence from Rhodobacter capsulatus 

30 (GenBank accession number P26242; SEQ ID NO:25); RSdxslp represents an amino 
acid sequence encoded by a Rhodobacter sphaeroides dxsl gene (SEQ ID NO:26); 



10 



WO 02/26933 



PCT/US01/30328 



RSdxs2p represents an amino acid sequence encoded by a Rhodobacter sphaeroides dxs2 
gene (SEQ ID NO:27); SPCCdxsp represents an amino acid sequence from 
Synechococcus PCC6301 (GenBank accession number CAB60078; SEQ ID NO:28); 
SPdxsp represents an amino acid sequence from Synechocystis PCC6803 (GenBank 
5 accession number P73067; SEQ ID NO:29); TMdxsp represents an amino acid sequence 
from Thermotoga maritime* (GenBank accession number Q9X291; SEQ ID NO:30); 
ECdxsp represents an amino acid sequence from Escherichia coli (GenBank accession 
number D64771; SEQ ID NO:31); NMdxsp represents an amino acid sequence from 
Neisseria meningitidis (GenBank accession number CAB83880; SEQ ID NO:32); HIdxsp 

10 represents an amino acid sequence from Haemophilus influenza (GenBank accession 
number B64172; SEQ ID NO:33); PFdxsp represents an amino acid sequence from 
Plasmodium falciparum (GenBank accession number AAD03740; SEQ ID NO:34); 
SSdxsp represents an amino acid sequence from Streptomyces sp. CL190 (GenBank 
accession number BAA85847; SEQ ID NO:35); and HPdxsp represents an amino acid 

1 5 sequence from Helicobacter pylori 26695 (GenBank accession number AAD07422; SEQ 
IDNO:36). 

Figure 7 is a listing of a nucleic acid sequence that encodes a Rhodobacter 
sphaeroides (ATCC 17023) polypeptide having DDS activity (SEQ ID NO:37). The start 
codon is the ATG at nucleotide number 372, and the stop codon is the TGA at nucleotide 
20 number 1373. The probable ribosome binding site is at nucleotide numbers 363-366. 
This sequence contains an open reading frame as well as 5' and 3' untranslated 
sequences. 

Figure 8 is a listing of a nucleic acid sequence that encodes a Rhodobacter 
sphaeroides (ATCC 17023) polypeptide having DDS activity (SEQ ID NO:38). This 
25 sequence corresponds to the open reading frame. 

Figure 9 is a listing of an amino acid sequence of a Rhodobacter sphaeroides 
(ATCC 17023) polypeptide having DDS activity (SEQ ID NO:39). 

Figure 10 is a listing of a nucleic acid sequence that encodes a Sphingomonas 
trueperi (ATCC 12417) polypeptide having DDS activity (SEQ ID NO:40). The start * 
30 codon is the ATG at nucleotide number 605, and the stop codon is the TGA at nucleotide 
number 1618. The probable ribosome binding site is at nucleotide numbers 590-594. 
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This sequence contains an open reading frame as well as 5' and 3' untranslated 
sequences. 

Figure 1 1 is a listing of a nucleic acid sequence that encodes a Sphingomonas 
frueperi (ATCC 12417) polypeptide having DDS activity (SEQ ID NO:41). This 
5 sequence corresponds to the open reading frame. 

Figure 12 is a listing of an amino acid sequence of a Sphingomonas trueperi 
(ATCC 12417) polypeptide having DDS activity (SEQ ID NO:42). This sequence 
corresponds to the open reading frame. 

Figure 13 is a sequence pile-up of five nucleic acid sequences that encode 
10 polypeptides having DDS activity. RSddsdna represents the nucleic acid sequence set 
forth in SEQ ID NO:38; STddsdna represents the nucleic acid sequence set forth in SEQ 
ID NO:41 ; SPddsdna represents a nucleic acid sequence from Schizosaccharomyces 
pombe (GenBank accession number D8431 1; SEQ ID NO:43); GSddsdna represents a 
nucleic acid sequence from Gluconobacter suboxydans (GenBank accession number 
1 5 AB006850; SEQ ID NO:44); and RCddsdna represents a nucleic acid sequence from 
Rhodobacter capsulatus (U.S. Patent No. 6,103,488; SEQ ID NO:45). 

Figure 14 is a sequence pile-up of five amino acid sequences of polypeptides 
having DDS activity. RSddsp represents the amino acid sequence set forth in SEQ ID 
NO:39; STddsp represents the amino acid sequence set forth in SEQ ID NO:42; GSddsp 
20 represents an amino acid sequence from Gluconobacter suboxydans (GenBank accession 
number BAA32241; SEQ ID NO:46); SPddsp represents an amino acid sequence from 
Schizosaccharomyces pombe (GenBank accession number CAB66154; SEQ ID NO:47); 
and RCddsp represents an amino acid sequence from Rhodobacter capsulatus (U.S. 
Patent No. 6,103,488; SEQ IDNO:48). 
25 Figure 15 is a sequence pile-up of three amino acid sequences of polypeptides 

having DXS activity. Hpdxsp represents the amino acid sequence set forth in SEQ ID 
NO:36; Ecdxsp represents the amino acid sequence set forth in SEQ ID NO:31; and 
Hidxsp represents the amino acid sequence set forth in SEQ ID NO:33. 

Figure 16 is a sequence pile-up of four amino acid sequences of polypeptides 
30 having DDS, ODS (octaprenyl diphosphate synthase), or SDS (solanesyl diphosphate 
synthase) activity. Rcsdsp represents an amino acid sequence from Rhodobacter 
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capsulatus having SDS activity (SEQ ID NO:49); Rpodsp represents an amino acid 
sequence from Rickettsia prowazeki having ODS activity (SEQ ID NO:50); Gsddsp 
represents the amino acid sequence set forth in SEQ ID NO:46; and Ecodsp represents an 
amino acid sequence from Escherichia coli ispB having ODS activity (SEQ ID NO:51). 
5 Figure 1 7 is a sequence pile-up of five amino acid sequences of polypeptides 

having DDS, ODS, or SDS activity. Rpodsp represents the amino acid sequence set forth 
in SEQ ID NO:50; Gsddsp represents the amino acid sequence set forth in SEQ ID 
NO:46; Ecodsp represents the amino acid sequence set forth in SEQ ID NO:51; Hiodsp 
represents an amino acid sequence from Haemophilus influenze having ODS activity 
10 (SEQ ID NO: 5 2); and Rcsdsp represents the amino acid sequence set forth in SEQ ID 
NO:49. 

Figure 18 is a diagram of a construct designated appUC18-SHDXS. 
Figure 19 is a diagram of a construct designated appUC18-RSdds. 
Figure 20 is a diagram of a construct designated appUC18-SHDDS. 
15 Figure 21 is a mass chromatogram obtained from a MG1655 PUC18 specimen. 

Figure 22 is a mass chromatogram obtained from a MG1655 PUC18-DDS 
specimen. 

Figure 23 is a mass spectra obtained from a MG1655 PUC18 specimen. 
Figure 24 is a mass spectra obtained from a MG1655 PUC18-DDS specimen. 
20 Figure 25 is a mass spectra obtained from a MG1655 PUC18-DDS specimen. 

Figure 26 is a graph plotting length and percent identity with points A, B, C, and 
D defining an area indicated by shading. 

Figure 27 is a sequence pile-up of seven amino acid sequences of polypeptides 
having DXR activity. Bsdxrp represents an amino acid sequence from Bacillus subtilis 
25 (SEQ ID NO:98); Hmdxrp represents an amino acid sequence from Haemophilus 

influenzae (SEQ ID NO:99); Ecdxrp represents an amino acid sequence from Escherishia 
coli (SEQ ID NO: 100); Zmdxrp represents an amino acid sequence from Zymonas 
mobilis (SEQ ID NO: 101); Sldxrp represents an amino acid sequence from 
Synechococcus leopoliensis (SEQ ID NO: 102); Ssdxrp represents an amino acid sequence 
30 from Synechocystis sp. PCC6803 (SEQ ID NO:103); and Mtdxrp represents an amino 
acid sequence from Mycobacterium tuberculosis (SEQ ID NO: 104). 
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Figure 28 is a listing of a nucleic acid sequence that encodes a Sphingomonas 
trueperi polypeptide having DXR activity (SEQ ID NO:95). The start codon is the GTG 
at either nucleotide number 575 or 578, and the stop codon is the TGA at nucleotide 
number 1733. This sequence contains an open reading frame as well as 5' and 3 5 
5 untranslated sequences. 

Figure 29 is a listing of a nucleic acid sequence that encodes a Sphingomonas 
trueperi polypeptide having DXR activity (SEQ ID NO:96). This sequence corresponds 
to the open reading frame. 

Figure 30 is a listing of an amino acid sequence of a Sphingomonas trueperi 

10 polypeptide having DXR activity (SEQ ID NO:97). 

Figure 31 is a sequence pile-up of twelve nucleic acid sequences that encode 
polypeptides having DXR activity. Stdxrcds represents the nucleic acid sequence set 
forth in SEQ ID NO:96; Padxrd represents a nucleic acid sequence from Pseudomonas 
aeruginosa (SEQ ID NO: 105); Zmdxrd represents a nucleic acid sequence from 

1 5 Zygomonas mobilis (SEQ ID NO: 106); Sgdxrd represents a nucleic acid sequence from 
Sireptomyces griseolosporeus (SEQ ID NO: 107); Nmdxrd represents a nucleic acid 
sequence from Neisseria meningitidis (SEQ ID NO: 108); Ecdxrd represents a nucleic 
acid sequence from Escherishia coli (SEQ ID NO: 109); Sldxrd represents a nucleic acid 
sequence from Synechococcus leopoliensis (SEQ ID NO:l 10); Mldxrd represents a 

20 nucleic acid sequence from Mycobacterium leprae (SEQ ID NO: 1 1 1); Pmdxrd represents 
a nucleic acid sequence from Pasteur ella multocida (SEQ ID NO:l 12); Atdxrd represents 
a nucleic acid sequence from Arabidopsis thaliana (SEQ ID NO:l 13); Cjdxrd represents 
a nucleic acid sequence from Campylobacter jejuni (SEQ ID NO:l 14); and Pfdxrd 
represents a nucleic acid sequence from Plasmodium falciparum (SEQ ID NO:l 15). 

25 Figure 32 is a sequence pile-up of sixteen amino acid sequences of polypeptides 

having DXR activity. Stdxrp represents the amino acid sequence set forth in SEQ ID 
NO:97; Zmdxrp represents an amino acid sequence from Zymononas mobilis (SEQ ID 
NO:l 16); Padxrp represents an amino acid sequence from Pseudomonas aeruginosa 
(SEQ ID NO:l 17); Ecdxrp represents an amino acid sequence from Escherishia coli 

30 (SEQ ID NO:l 1 8); Nmdxrp represents an amino acid sequence from Neisseria 
meningitidis (SEQ ID NO:l 19); Hidxrp represents an amino acid sequence from 
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Haemophilus influenzae (SEQ ID NO: 120); Ssdxrp represents an amino acid sequence 
from Synechocystis sp. PCC6803 (SEQ ID NO: 121); Pmdxrp represents an amino acid 
sequence from Pasteurella multocida (SEQ ID NO:122); Sldxrp represents an amino acid 
sequence from Synechococcus leopoliensis (SEQ ID NO:123); Sgdxrp represents an 

5 amino acid sequence from Streptomyces griseolosporeus (SEQ ID NO: 1 24); Bsdxrp 
represents an amino acid sequence from Bacillus subtilis (SEQ ID NO: 125); Mldxrp 
represents an amino acid sequence from Mycobacterium leprae (SEQ ID NO: 126); 
Mtdxrp represents an amino acid sequence from Mycobacterium tuberculosis (SEQ ID 
NO: 127); Atdxrp represents an amino acid sequence from Arabidopsis thaliana (SEQ ID 

1 0 NO: 128); Cjdxrp represents an amino acid sequence from Campylobacter jejuni (SEQ ID 
NO: 130); and Pfdxrp represents an amino acid sequence from Plasmodium falciparum 
(SEQ IDNO:131). 



DETAILED DESCRIPTION 

1 5 The invention provides methods and materials related to the production of 

isoprenoids. Specifically, the invention provides isolated nucleic acids, substantially 
pure polypeptides, host cells, and methods and materials for producing various isoprenoid 
compounds. For the purpose of this invention, an isoprenoid compound is any compound 
containing a five-carbon isoprenoid unit. Examples of isoprenoid compounds include, 

20 without limitation, carotenoids, isoprenes, sterols, terpenes, and ubiquinones. Such 
isoprenoid compounds can be used in a wide range of applications. For example, 
isoprenoid compounds produced as described herein can be used in industrial, 
pharmaceutical, or cosmetic products. 

In general terms, carotenoids are lipophilic pigments typically found in 

25 photosynthetic plants and bacteria. Examples of carotenoids include, without limitation, 
carotenes, xanthophylls, hydrocarbon carotenoids, hydroxy carotenoid derivatives, epoxy 
carotenoid derivatives, furanoxy carotenoid derivatives, and oxy carotenoid derivatives. 
Isoprenes are oily hydrocarbons that can be obtained by distilling caoutchouc or 
guttapercha. Examples of isoprenes include, without limitation, rubber, vitamin A, and 

30 vitamin K. Sterols are steroid-based alcohols typically having a hydrocarbon side-chain 
of eight to ten carbon atoms at the 17-beta position and a hydroxyl group at the 3 -beta 
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position. Examples of sterols include, without limitation, ergosterol, cholesterol, and 
stigmasterol. Terpenes are lipid species typically found in plants in great abundance. 
Examples of terpenes include, without limitation, dolichol, squalene, and limonene. 
Ubiquinones are 2,3-dimethoxy-5-methylbenzoquinone derivatives having a side chain 

5 containing at least one isoprenoid unit. Typically, ubiquinone is referred to as Coenzyme 
Q (CoQ). In addition, the number of isoprenoid units of a side chain of a particular 
ubiquinone is used to identify that particular ubiquinone. For example, a ubiquinone with 
six isoprenoid units is referred to as CoQ(6), while a ubiquinone with ten isoprenoid units 
is referred to as CoQ(lO). It is noted that CoQ(lO) also is referred to as ubidecarenone. 

10 Examples of ubiquinones include, without limitation, CoQ(6), CoQ(8), CoQ(lO), and 
CoQ(12). 

Isoprenoid compounds can be pyruvate-derived products. The term "pyruvate- 
derived product" as used herein refers to any compound that is synthesized from pyruvate 
within no more than 25 enzymatic steps. Thus, an isoprenoid compound is not a 

15 pyruvate-derived product if that isoprenoid compound is synthesized from pyruvate in 
more than 25 enzymatic steps. An enzymatic step is a single chemical reaction catalyzed 
by a polypeptide having enzymatic activity. The term "polypeptide having enzymatic 
activity" as used herein refers to any polypeptide that catalyzes a chemical reaction of 
other substances without itself being destroyed or altered upon completion of the reaction. 

20 Typically, a polypeptide having enzymatic activity catalyzes the formation of one or more 
products from one or more substrates. Such polypeptides can have any type of enzymatic 
activity including, without limitation, the enzymatic activity associated with an enzyme 
such as DXS, DDS, ODS, SDS, DXR (1-deoxy-D-xylulose 5-phosphate 
reductoisomerase), ispD (4-diphosphocytidyl-2C-methyl-D-eiythritol synthase), and ispE 

25 (4-diphosphocytidyl-2C-methyl-D-erythritol kinase). 

A polypeptide having a particular enzymatic activity can be a polypeptide that is 
either naturally-occurring or non-naturally-occurring. A naturally-occurring polypeptide 
is any polypeptide having an amino acid sequence as found in nature, including wild-type 
and polymorphic polypeptides. Such naturally-occurring polypeptides can be obtained 

30 from any species including, without limitation, animal (e.g., mammalian), plant, fungal, 
and bacterial species. A non-naturally-occurring polypeptide is any polypeptide having 



16 



WO 02/26933 



PCT/US01/30328 



an amino acid sequence that is not found in nature. Thus, a non-naturally-occurring 
polypeptide can be a mutated version of a naturally-occurring polypeptide, or an 
engineered polypeptide. For example, a non-naturally-occurring polypeptide having DDS 
activity can be a mutated version of a naturally-occurring polypeptide having DDS 

5 activity that retains at least some DDS activity. A polypeptide can be mutated by, for 
example, sequence additions, deletions, substitutions, or combinations thereof. 

Examples of isoprenoid compounds that are pyruvate-derived products include, 
without limitation, CoQ(6), CoQ(7), CoQ(8), CoQ(9), CoQ(lO), astaxanthin, 
canthaxanthin, lutein, zeaxanthin, beta-carotene, lycopene, capsanthin, bixin, norbixin, 

10 crocetin, zeta-carotene, vitamin E, giberellins, abscisic acid, ergosterol, geraniol, and 
latex. 

As depicted in Figure 1, multiple polypeptide can be used to convert glucose 
CoQ(lO). For example, polypeptides having DXS, DXR, LytB, and DDS activity can be 
used to convert glucose CoQ(lO). Such polypeptides can be obtained and used to make 
15 CoQ( 1 0) as described herein. 

1. Nucleic acids 

The term "nucleic acid" as used herein encompasses both RNA and DNA, 
including cDNA, genomic DNA, and synthetic (e.g., chemically synthesized) DNA. The 

20 nucleic acid can be double-stranded or single-stranded. Where single-stranded, the 

nucleic acid can be the sense strand or the antisense strand. In addition, nucleic acid can 
be circular or linear. 

The term "isolated" as used herein with reference to nucleic acid refers to a 
naturally-occurring nucleic acid that is not immediately contiguous with both of the 

25 sequences with which it is immediately contiguous (one on the 5' end and one on the 3' 
end) in the naturally-occurring genome of the organism from which it is derived. For 
example, an isolated nucleic acid can be, without limitation, a recombinant DNA 
molecule of any length, provided one of the nucleic acid sequences normally found 
immediately flanking that recombinant DNA molecule in a naturally-occurring genome is 

30 removed or absent. Thus, an isolated nucleic acid includes, without limitation, a 

recombinant DNA that exists as a separate molecule (e.g., a cDNA or a genomic DNA 
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fragment produced by PCR or restriction endonuclease treatment) independent of other 
sequences as well as recombinant DNA that is incorporated into a vector, an 
autonomously replicating plasmid, a virus (e.g., a retrovirus, adenovirus, or herpes virus), 
or into the genomic DNA of a prokaryote or eukaryote. In addition, an isolated nucleic 
5 acid can include a recombinant DNA molecule that is part of a hybrid or fusion nucleic 
acid sequence. 

The term "isolated" as used herein with reference to nucleic acid also includes any 
non-naturally-occurring nucleic acid since non-naturally-occurring nucleic acid sequences 
are not found in nature and do not have immediately contiguous sequences in a naturally- 

10 occurring genome. For example, non-naturally-occurring nucleic acid such as an 

engineered nucleic acid is considered to be isolated nucleic acid. Engineered nucleic acid 
can be made using common molecular cloning or chemical nucleic acid synthesis 
techniques. Isolated non-naturally-occurring nucleic acid can be independent of other 
sequences, or incorporated into a vector, an autonomously replicating plasmid, a virus 

1 5 (e.g., a retrovirus, adenovirus, or herpes virus), or the genomic DNA of a prokaryote or 
eukaryote. In addition, a non-naturally-occurring nucleic acid can include a nucleic acid 
molecule that is part of a hybrid or fusion nucleic acid sequence. 

It will be apparent to those of skill in the art that a nucleic acid existing among 
hundreds to millions of other nucleic acid molecules within, for example, cDNA or 

20 genomic libraries, or gel slices containing a genomic DNA restriction digest is not to be 
considered an isolated nucleic acid. 

The term "exogenous" as used herein with reference to nucleic acid and a 
particular cell refers to any nucleic acid that does not originate from that particular cell as 
found in nature. Thus, all non-naturally-occurring nucleic acid is considered to be 

25 exogenous to a cell once introduced into the cell. It is important to note that non- 
naturally-occurring nucleic acid can contain nucleic acid sequences or fragments of 
nucleic acid sequences that are found in nature provided the nucleic acid as a whole does 
not exist in nature. For example, a nucleic acid molecule containing a genomic DNA 
sequence within an expression vector is non-naturally-occurring nucleic acid, and thus is 

30 exogenous to a cell once introduced into the cell, since that nucleic acid molecule as a 
whole (genomic DNA plus vector DNA) does not exist in nature. Thus, any vector, 
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autonomously replicating plasmid, or virus (e.g., retrovirus, adenovirus, or herpes virus) 
that as a whole does not exist in nature is considered to be non-naturally-occurring 
nucleic acid. It follows that genomic DNA fragments produced by PCR or restriction 
endonuclease treatment as well as cDNAs are considered to be non-naturally-occurring 

5 nucleic acid since they exist as separate molecules not found in nature. It also follows 
that any nucleic acid containing a promoter sequence and polypeptide-encoding sequence 
(e.g., cDNA or genomic DNA) in an arrangement not found in nature is non-naturally- 
occurring nucleic acid. 

Nucleic acid that is naturally-occurring can be exogenous to a particular cell. For 

10 example, an entire chromosome isolated from a cell of person X is an exogenous nucleic 
acid with respect to a cell of person Y once that chromosome is introduced into Y's cell. 

The invention provides isolated nucleic acid that contains a nucleic acid sequence 
having (1) a length, and (2) a percent identity to an identified nucleic acid sequence over 
that length. The invention also provides isolated nucleic acid that contains a nucleic acid 

1 5 sequence encoding a polypeptide that contains an amino acid sequence having (1) a 
length, and (2) a percent identity to an identified amino acid sequence over that length. 
Typically, the identified nucleic acid or amino acid sequence is a sequence referenced by 
a particular sequence identification number, and the nucleic acid or amino acid sequence 
being compared to the identified sequence is referred to as the target sequence. For 

20 example, an identified sequence can be the sequence set forth in SEQ ID NO: 1 . 

A length and percent identity over that length for any nucleic acid or amino acid 
sequence is determined as follows. First, a nucleic acid or amino acid sequence is 
compared to the identified nucleic acid or amino acid sequence using the BLAST 2 
Sequences (B12seq) program from the stand-alone version of BLASTZ containing 

25 BLASTN version 2.0.14 and BLASTP version 2.0.14. This stand-alone version of 
BLASTZ can be obtained from the University of Wisconsin library as well as at 
www.fr.com or www.ncbi.nlm.nih.gov. Instructions explaining how to use the B12seq 
program can be found in the readme file accompanying BLASTZ. B12seq performs a 
comparison between two sequences using either the BLASTN or BLASTP algorithm. 

30 BLASTN is used to compare nucleic acid sequences, while BLASTP is used to compare 
amino acid sequences. To compare two nucleic acid sequences, the options are set as 
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follows: -i is set to a file containing the first nucleic acid sequence to be compared (e.g., 
C:\seql.txt); -j is set to a file containing the second nucleic acid sequence to be compared 
(e.g., C:\seq2.txt); -p is set to blastn; -o is set to any desired file name (e.g., C:\outputtxt); 
-q is set to -1 ; -r is set to 2; and all other options are left at their default setting. For 
5 example, the following command can be used to generate an output file containing a 
comparison between two sequences: C:\B12seq -i c:\seql .txt -j c:\seq2.txt -p blastn -o 
c:\output.txt -q -1 -r 2. To compare two amino acid sequences, the options of B12seq are 
set as follows: -i is set to a file containing the first amino acid sequence to be compared 
(e.g., C:\seql.txt); -j is set to a file containing the second amino acid sequence to be 

10 compared (e.g., C:\seq2.txt); -p is set to blastp; -o is set to any desired file name (e.g., 
C:\outputtxt); and all other options are left at their default setting. For example, the 
following command can be used to generate an output file containing a comparison 
between two amino acid sequences: C:\B12seq -i c:\seql .txt -j c:\seq2.txt -p blastp -o 
c:\outputtxt If the target sequence shares homology with any portion of the identified 

1 5 sequence, then the designated output file will present those regions of homology as 

aligned sequences. If the target sequence does not share homology with any portion of 
the identified sequence, then the designated output file will not present aligned sequences. 
Once aligned, a length is determined by counting the number of consecutive nucleotides 
or amino acid residues from the target sequence presented in alignment with sequence 

20 from the identified sequence starting with any matched position and ending with any 

other matched position. A matched position is any position where an identical nucleotide 
or amino acid residue is presented in both the target and identified sequence. Gaps 
presented in the target sequence are not counted since gaps are not nucleotides or amino 
acid residues. Likewise, gaps presented in the identified sequence are not counted since 

25 target sequence nucleotides or amino acid residues are counted, not nucleotides or amino 
acid residues from the identified sequence. 

The percent identity over a determined length is determined by counting the 
number of matched positions over that length and dividing that number by the length 
followed by multiplying the resulting value by 100. For example, if (1) a 1000 nucleotide 

30 target sequence is compared to the sequence set forth in SEQ ID NO: 1 , (2) the B12seq 
program presents 200 nucleotides from the target sequence aligned with a region of the 
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sequence set forth in SEQ ID NO: 1 where the first and last nucleotides of that 200 
nucleotide region are matches, and (3) the number of matches over those 200 aligned 
nucleotides is 180, then the 1000 nucleotide target sequence contains a length of 200 and 
a percent identity over that length of 90 (i.e. 180 * 200 * 100 = 90). 

5 It will be appreciated that a single nucleic acid or amino acid target sequence that 

aligns with an identified sequence can have many different lengths with each length 
having its own percent identity. For example, a target sequence containing a 20 
nucleotide region that aligns with an identified sequence as follows has many different 
lengths including those listed in Table 1 . 

10 1 20 

Target Sequence: AGGTCGTGTACTGTCAGTCA 

I II III till till I 
Identified Sequence: ACGTGGTGAACTGCCAGTGA 

15 Table!. 



Starting 
Position 


Ending 
Position 


Length 


Matched 
Positions 


Percent 
Identity 


1 


20 


20 


15 


75.0 


1 


18 


18 


14 


77.8 


1 


15 


15 


11 


73.3 


6 


20 


15 


12 


80.0 


6 


17 j 


12 


10 


83.3 


6 


15 


10 


8 


80.0 


8 


20 


13 


10 


76.9 


8 


16 


9 


7 


77.8 



It is noted that the percent identity value is rounded to the nearest tenth. For 
example, 78.11, 78.12, 78.13, and 78.14 is rounded down to 78.1, while 78.15, 78.16, 
78. 1 7, 78. 1 8, and 78. 1 9 is rounded up to 78.2. It is also noted that the length value will 
20 always be an integer. 

The invention provides an isolated nucleic acid containing a nucleic acid sequence 
that has at least one length and percent identity over that length as determined above such 
that the point defined by that length and percent identity is within the area defined by 
points A, B, C, and D of Figure 26. In addition, the invention provides an isolated nucleic 
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acid containing a nucleic acid sequence that encodes a polypeptide containing an amino 
acid sequence that has at least one length and percent identity over that length as 
determined above such that the point defined by that length and percent identity is within 
the area defined by points A, B, C, and D of Figure 26. The point defined by a length and 
5 percent identity over that length is that point on the X/Y coordinate of Figure 26 where 
the X axis is the length and the Y axis is the percent identity. Thus, the point defined by a 
nucleic acid sequence with a length of 200 and a percent identity of 90 has coordinates 
(200, 90). For the purpose of this invention, any point that falls on point A, B, C, or D is 
considered within the area defined by points A, B, C, and D of Figure 26. Likewise, any 

10 point that falls on a line that defines the area defined by points A, B, C, and D is 
considered within the area defined by points A, B, C, and D of Figure 26. 

It will be appreciated that the term "the area defined by points A, B, C, and D of 
Figure 26" as used herein refers to that area defined by the lines that connect point A with 
point B, point B with point C, point C with point D, and point D with point A. Points A, 

15 B, C, and D can define an area having any shape defined by four points (e.g., square, 
rectangle, or rhombus). In addition, two or more points can have the same coordinates. 
For example, points B and C can have identical coordinates. In this case, the area defined 
by points A, B, C, and D of Figure 26 is triangular. If three points have identical 
coordinates, then the area defined by points A, B, C, and D of Figure 26 is a line. In this 

20 case, any point that falls on that line would be considered within the area defined by 

points A, B, C, and D of Figure 26. If all four points have identical coordinates, then the 
area defined by points A, B, C, and D of Figure 26 is a point In all cases, simple 
algebraic equations can be used to determine whether a point is within the area defined by 
points A, B, C, and D of Figure 26. 

25 It is noted that Figure 26 is a graphical representation presenting possible 

positions of points A, B, C, and D. The shaded area illustrated in Figure 26 represents 
one possible example, while the arrows indicate that other positions for points A, B, C, 
and D are possible. In fact, points A, B, C, and D can have any X coordinate and any Y 
coordinate. For example, point A can have an X coordinate equal to the number of 

30 nucleotides or amino acid residues in an identified sequence, and a Y coordinate of 100, 
Point B can have an X coordinate equal to the number of nucleotides or amino acid 
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residues in an identified sequence, and a Y coordinate less than or equal to 100 (e.g., 50, 
55, 65, 70, 75, 80, 85, 90, 95, and 99). Point C can have an X coordinate equal to a 
percent (e.g., 1, 2, 5, 10, 15, or more percent) of the number of nucleotides or amino acid 
residues in an identified sequence, and a Y coordinate less than or equal to 100 (e.g., 50, 
5 55, 65, 70, 75, 80, 85, 90, 95, and 99), Point D can have an X coordinate equal to the 
length of a typical PCR primer (e.g., 12, 13, 14, 15, 16, 17, or more) or antigenic 
polypeptide (e.g., 5, 6, 7, 8, 9, 10, 11, 12, or more), and a Y coordinate less than or equal 
to 100 (e.g., 50, 55, 65, 70, 75, 80, 85, 90, 95, and 99). 

An isolated nucleic acid containing a nucleic acid sequence having a length and a 

10 percent identity to the sequence set forth in SEQ ID NO:l over that length is within the 
scope of the invention provided the point defined by that length and percent identity is 
within the area defined by points A, B, C, and D of Figure 26; where point A has an X 
coordinate less than or equal to 3626, and a Y coordinate less than or equal to 100; where 
point B has an X coordinate less than or equal to 3626, and a Y coordinate greater than or 

15 equal to 65; where point C has an X coordinate greater than or equal to 50, and a Y 
coordinate greater than or equal to 65; and where point D has an X coordinate greater 
than or equal to 12, and a Y coordinate less than or equal to 100. For example, the X 
coordinate for point A can be 3626, 3600, 3500, 3000, 2500, or less; and the Y coordinate 
for point A can be 100, 99, 95, 90, 85, 80, 75, or less. The X coordinate for point B can 

20 be 3626, 3600, 3500, 3000, 2500, or less; and the Y coordinate for point B can be 65, 70, 
75, 80, 85, 90, 95, 99 or more. The X coordinate for point C can be 50, 60, 70, 80, 90, 
100, 150, 200, or more; and the Y coordinate for point C can be 65, 70, 75, 80, 85, 90, 95, 
99 or more. The X coordinate for point D can be 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 
30, 40, 50, 75, 100, or more; and the Y coordinate for point D can be 100, 99, 95, 90, 85, 

25 80, 75, or less. In one embodiment, point A can be (3626, 100), point B can be (3626, 
95), point C can be (1900, 95), and point D can be (1900, 100). 

An isolated nucleic acid containing a nucleic acid sequence having a length and a 
percent identity to the sequence set forth in SEQ ID NO:2 over that length is within the 
scope of the invention provided the point defined by that length and percent identity is 

30 within the area defined by points A, B, C, and D of Figure 26; where point A has an X 
coordinate less than or equal to 1926, and a Y coordinate less than or equal to 100; where 
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point B has an X coordinate less than or equal to 1926, and a Y coordinate greater than or 
equal to 65; where point C has an X coordinate greater than or equal to 50, and a Y 
coordinate greater than or equal to 65; and where point D has an X coordinate greater 
than or equal to 12, and a Y coordinate less than or equal to 100. For example, the X 
5 coordinate for point A can be 1926, 1900, 1850, 1800, 1750, or less; and the Y coordinate 
for point A can be 100, 99, 95, 90, 85, 80, 75, or less. The X coordinate for point B can 
be 1926, 1900, 1850, 1800, 1750, or less; and the Y coordinate for point B can be 65, 70, 
75, 80, 85, 90, 95, 99 or more. The X coordinate for point C can be 50, 60, 70, 80, 90, 
100, 150, 200, or more; and the Y coordinate for point C can be 65, 70, 75, 80, 85, 90, 95, 

10 99 or more. The X coordinate for point D can be 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 
30, 40, 50, 75, 100, or more; and the Y coordinate for point D can be 100, 99, 95, 90, 85, 
80, 75, or less. In one embodiment, point A can be (1926, 100), point B can be (1926, 
95), point C can be (1000, 95), and point D can be (1000, 100). 

An isolated nucleic acid containing a nucleic acid sequence that encodes a 

1 5 polypeptide containing an amino acid sequence having a length and a percent identity to 
the sequence set forth in SEQ ID NO:3 over that length is within the scope of the 
invention provided the point defined by that length and percent identity is within the area 
defined by points A, B, C, and D of Figure 26; where point A has an X coordinate less 
than or equal to 641, and a Y coordinate less than or equal to 100; where point B has an X 

20 coordinate less than or equal to 641 , and a Y coordinate greater than or equal to 50; where 
point C has an X coordinate greater than or equal to 25, and a Y coordinate greater than 
or equal to 50; and where point D has an X coordinate greater than or equal to 5, and a Y 
coordinate less than or equal to 100. For example, the X coordinate for point A can be 
641, 635, 630, 625, 620, or less; and the Y coordinate for point A can be 100, 99, 95, 90, 

25 85, 80, 75, or less. The X coordinate for point B can be 641, 635, 630, 625, 620, or less; 
and the Y coordinate for point B can be 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 99 or more. 
The X coordinate for point C can be 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, or 
more; and the Y coordinate for point C can be 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 99 or 
more. The X coordinate for point D can be 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 75, 

30 100, or more; and the Y coordinate for point D can be 100, 99, 95, 90, 85, 80, 75, or less. 
In one embodiment, point A can be (641, 100), point B can be (641, 95), point C can be 
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(400, 95), and point D can be (400, 100). 

An isolated nucleic acid containing a nucleic acid sequence having a length and a 
percent identity to the sequence set forth in SEQ ED NO:37 over that length is within the 
scope of the invention provided the point defined by that length and percent identity is 
5 within the area defined by points A, B, C, and D of Figure 26; where point A has an X 
coordinate less than or equal to 1990, and a Y coordinate less than or equal to 100; where 
point B has an X coordinate less than or equal to 1990, and a Y coordinate greater than or 
equal to 65; where point C has an X coordinate greater than or equal to 50, and a Y 
coordinate greater than or equal to 65; and where point D has an X coordinate greater 

10 than or equal to 12, and a Y coordinate less than or equal to 100. For example, the X 
coordinate for point A can be 1990, 1950, 1900, 1850, 1800, 1750, or less; and the Y 
coordinate for point A can be 100, 99, 95, 90, 85, 80, 75, or less. The X coordinate for 
point B can be 1990, 1950, 1900, 1850, 1800, 1750, or less; and the Y coordinate for 
point B can be 65, 70, 75, 80, 85, 90, 95, 99 or more. The X coordinate for point C can 

1 5 be 50, 60, 70, 80, 90, 100, 150, 200, or more; and the Y coordinate for point C can be 65, 
70, 75, 80, 85, 90, 95, 99 or more. The X coordinate for point D can be 12, 13, 14, 15, 
16, 17, 18, 19, 20, 25, 30, 40, 50, 75, 100, or more; and the Y coordinate for point D can 
be 100, 99, 95, 90, 85, 80, 75, or less. In one embodiment, point A can be (1990, 100), 
point B can be (1990, 95), point C can be (1000, 95), and point D can be (1000, 100). 

20 An isolated nucleic acid containing a nucleic acid sequence having a length and a 

percent identity to the sequence set forth in SEQ ID NO:38 over that length is within the 
scope of the invention provided the point defined by that length and percent identity is 
within the area defined by points A, B, C, and D of Figure 26; where point A has an X 
coordinate less than or equal to 1002, and a Y coordinate less than or equal to 100; where 

25 point B has an X coordinate less than or equal to 1002, and a Y coordinate greater than or 
equal to 65; where point C has an X coordinate greater than or equal to 50, and a Y 
coordinate greater than or equal to 65; and where point D has an X coordinate greater 
than or equal to 12, and a Y coordinate less than or equal to 100. For example, the X 
coordinate for point A can be 1002, 950, 900, 850, 800, 750, or less; and the Y coordinate 

30 for point A can be 100, 99, 95, 90, 85, 80, 75, or less. The X coordinate for point B can 
be 1002, 950, 900, 850, 800, 750, or less; and the Y coordinate for point B can be 65, 70, 
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75, 80, 85, 90, 95, 99 or more. The X coordinate for point C can be 50, 60, 70, 80, 90, 
100, 150, 200, or more; and the Y coordinate for point C can be 65, 70, 75, 80, 85, 90, 95, 
99 or more. The X coordinate for point D can be 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 
30, 40, 50, 75, 100, or more; and the Y coordinate for point D can be 100, 99, 95, 90, 85, 
5 80, 75, or less. In one embodiment, point A can be (1002, 100), point B can be (1002, 
95), point C can be (500, 95), and point D can be (500, 100). 

An isolated nucleic acid containing a nucleic acid sequence that encodes a 
polypeptide containing an amino acid sequence having a length and a percent identity to 
the sequence set forth in SEQ ID NO:39 over that length is within the scope of the 

1 0 invention provided the point defined by that length and percent identity is within the area 
defined by points A, B, C, and D of Figure 26; where point A has an X coordinate less 
than or equal to 333, and a Y coordinate less than or equal to 100; where point B has an X 
coordinate less than or equal to 333, and a Y coordinate greater than or equal to 50; where 
point C has an X coordinate greater than or equal to 25, and a Y coordinate greater than 

1 5 or equal to 50; and where point D has an X coordinate greater than or equal to 5, and a Y 
coordinate less than or equal to 100. For example, the X coordinate for point A can be 
333, 330 s 325, 320, 315, or less; and the Y coordinate for point A can be 100, 99, 95, 90, 
85, 80, 75, or less. The X coordinate for point B can be 333, 330, 325, 320, 315, or less; 
and the Y coordinate for point B can be 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 99 or more. 

20 The X coordinate for point C can be 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, or 
more; and the Y coordinate for point C can be 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 99 or 
more. The X coordinate for point D can be 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 75, 
100, or more; and the Y coordinate for point D can be 100, 99, 95, 90, 85, 80, 75, or less. 
In one embodiment, point A can be (333, 100), point B can be (333, 95), point C can be 

25 (150, 95), and point D can be (150, 100). 

An isolated nucleic acid containing a nucleic acid sequence having a length and a 
percent identity to the sequence set forth in SEQ ID NO:40 over that length is within the 
scope of the invention provided the point defined by that length and percent identity is 
within the area defined by points A, B, C, and D of Figure 26; where point A has an X 

30 coordinate less than or equal to 1833, and a Y coordinate less than or equal to 100; where 
point B has an X coordinate less than or equal to 1 833, and a Y coordinate greater than or 
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equal to 65; where point C has an X coordinate greater than or equal to 50, and a Y 
coordinate greater than or equal to 65; and where point D has an X coordinate greater 
than or equal to 12, and a Y coordinate less than or equal to 100. For example, the X 
coordinate for point A can be 1833, 1800, 1750, 1700, 1650, or less; and the Y coordinate 

5 for point A can be 1 00, 99, 95, 90, 85, 80, 75, or less. The X coordinate for point B can 
be 1833, 1800, 1750, 1700, 1650, or less; and the Y coordinate for point B can be 65, 70, 
75, 80, 85, 90, 95, 99 or more. The X coordinate for point C can be 50, 60, 70, 80, 90, 
100, 150, 200, or more; and the Y coordinate for point C can be 65, 70, 75, 80, 85, 90, 95, 
99 or more. The X coordinate for point D can be 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 

10 30, 40, 50, 75, 100, or more; and the Y coordinate for point D can be 100, 99, 95, 90, 85, 
80, 75, or less. In one embodiment, point A can be (1 833, 100), point B can be (1833, 
95), point C can be (900, 95), and point D can be (900, 100). 

An isolated nucleic acid containing a nucleic acid sequence having a length and a 
percent identity to the sequence set forth in SEQ ID NO:41 over that length is within the 

1 5 scope of the invention provided the point defined by that length and percent identity is 
within the area defined by points A, B, C, and D of Figure 26; where point A has an X 
coordinate less than or equal to 1014, and a Y coordinate less than or equal to 100; where 
point B has an X coordinate less than or equal to 1014, and a Y coordinate greater than or 
equal to 65; where point C has an X coordinate greater than or equal to 50, and a Y 

20 coordinate greater than or equal to 65; and where point D has an X coordinate greater 
than or equal to 12, and a Y coordinate less than or equal to 100. For example, the X 
coordinate for point A can be 1014, 950, 900, 800, 700, 600, or less; and the Y coordinate 
for point A can be 100, 99, 95, 90, 85, 80, 75, or less. The X coordinate for point B can 
be 1014, 950, 900, 800, 700, 600, or less; and the Y coordinate for point B can be 65, 70, 

25 75, 80, 85, 90, 95, 99 or more. The X coordinate for point C can be 50, 60, 70, 80, 90, 

100, 1 50, 200, or more; and the Y coordinate for point C can be 65, 70, 75, 80, 85, 90, 95, 
99 or more. The X coordinate for point D can be 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 
30, 40, 50, 75, 100, or more; and the Y coordinate for point D can be 100, 99, 95, 90, 85, 
80, 75, or less. In one embodiment, point A can be (1014, 100), point B can be (1014, 

30 95), point C can be (500, 95), and point D can be (500, 100). 

An isolated nucleic acid containing a nucleic acid sequence that encodes a 
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polypeptide containing an amino acid sequence having a length and a percent identity to 
the sequence set forth in SEQ ID NO:42 over that length is within the scope of the 
invention provided the point defined by that length and percent identity is within the area 
defined by points A, B, C, and D of Figure 26; where point A has an X coordinate less 

5 than or equal to 337, and a Y coordinate less than or equal to 100; where point B has an X 
coordinate less than or equal to 337, and a Y coordinate greater than or equal to 50; where 
point C has an X coordinate greater than or equal to 25, and a Y coordinate greater than 
or equal to 50; and where point D has an X coordinate greater than or equal to 5, and a Y 
coordinate less than or equal to 100. For example, the X coordinate for point A can be 

10 337, 335, 330, 325, 320, 315, or less; and the Y coordinate for point A can be 100, 99, 95, 
90, 85, 80, 75, or less. The X coordinate for point B can be 337, 335, 330, 325, 320, 315, 
or less; and the Y coordinate for point B can be 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 99 
or more. The X coordinate for point C can be 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 
200, or more; and the Y coordinate for point C can be 50, 55, 60, 65, 70, 75, 80, 85, 90, 

15 95, 99 or more. The X coordinate for point D can be 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 
50, 75, 100, or more; and the Y coordinate for point D can be 100, 99, 95, 90, 85, 80, 75, 
or less. In one embodiment, point A can be (337, 100), point B can be (337, 95), point C 
can be (150, 95), and point D can be (150, 100). 

An isolated nucleic acid containing a nucleic acid sequence having a length and a 

20 percent identity to the sequence set forth in SEQ ID NO:95 over that length is within the 
scope of the invention provided the point defined by that length and percent identity is 
within the area defined by points A, B, C, and D of Figure 26; where point A has an X 
coordinate less than or equal to 2017, and a Y coordinate less than or equal to 100; where 
point B has an X coordinate less than or equal to 2017, and a Y coordinate greater than or 

25 equal to 65; where point C has an X coordinate greater than or equal to 50, and a Y 
coordinate greater than or equal to 65; and where point D has an X coordinate greater 
than or equal to 12, and a Y coordinate less than or equal to 100. For example, the X 
coordinate for point A can be 2017, 2000, 1900, 1950, 1800, 1700, 1600, or less; and the 
Y coordinate for point A can be 100, 99, 95, 90, 85, 80, 75, or less. The X coordinate for 

30 point B can be 2017, 2000, 1900, 1950, 1800, 1700, 1600, or less; and the Y coordinate 
for point B can be 65, 70, 75, 80, 85, 90, 95, 99 or more. The X coordinate for point C 
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can be 50, 60, 70, 80, 90, 100, 150, 200, 500, 1000, 1500, or more; and the Y coordinate 
for point C can be 65, 70, 75, 80, 85, 90, 95, 99 or more. The X coordinate for point D 
can be 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, 75, 100, 250, 500, 1000, 1500, or 
more; and the Y coordinate for point D can be 100, 99, 95, 90, 85, 80, 75, or less. In one 
5 embodiment, point A can be (2017, 100), point B can be (2017, 95), point C can be 
(1800, 95), and point D can be (1800, 100). 

An isolated nucleic acid containing a nucleic acid sequence having a length and a 
percent identity to the sequence set forth in SEQ ID NO:96 over that length is within the 
scope of the invention provided the point defined by that length and percent identity is 

1 0 within the area defined by points A, B, C, and D of Figure 26; where point A has an X 
coordinate less than or equal to 1 161, and a Y coordinate less than or equal to 100; where 
point B has an X coordinate less than or equal to 1 161, and a Y coordinate greater than or 
equal to 65; where point C has an X coordinate greater than or equal to 50, and a Y 
coordinate greater than or equal to 65; and where point D has an X coordinate greater 

1 5 than or equal to 12, and a Y coordinate less than or equal to 100. For example, the X 
coordinate for point A can be 1 161, 1050, 1000, 950, 900, 800, 700, 600, or less; and the 
Y coordinate for point A can be 100, 99, 95, 90, 85, 80, 75, or less. The X coordinate for 
point B can be 1 161, 1050, 1000, 950, 900, 800, 700, 600, or less; and the Y coordinate 
for point B can be 65, 70, 75, 80, 85, 90, 95, 99 or more. The X coordinate for point C 

20 can be 50, 60, 70, 80, 90, 100, 150, 200, 250, 500, 1000, or more; and the Y coordinate 
for point C can be 65, 70, 75, 80, 85, 90, 95, 99 or more. The X coordinate for point D 
can be 12, 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, 75, 100, 250, 500, 1000, or more; 
and the Y coordinate for point D can be 100, 99, 95, 90, 85, 80, 75, or less. In one 
embodiment, point A can be (1 161, 100), point B can be (1161, 95), point C can be 

25 (1000, 95), and point D can be (1000, 100). 

An isolated nucleic acid containing a nucleic acid sequence that encodes a 
polypeptide containing an amino acid sequence having a length and a percent identity to 
the sequence set forth in SEQ ID NO:97 over that length is within the scope of the 
invention provided the point defined by that length and percent identity is within the area 

30 defined by points A, B, C, and D of Figure 26; where point A has an X coordinate less 
than or equal to 386, and a Y coordinate less than or equal to 100; where point B has an X 
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coordinate less than or equal to 386, and a Y coordinate greater than or equal to 50; where 
point C has an X coordinate greater than or equal to 25, and a Y coordinate greater than 
or equal to 50; and where point D has an X coordinate greater than or equal to 5, and a Y 
coordinate less than or equal to 100. For example, the X coordinate for point A can be 
5 386, 380, 375, 370, 375, 360, 365, 350, 325, 300, or less; and the Y coordinate for point 
A can be 100, 99, 95, 90, 85, 80, 75, or less. The X coordinate for point B can be 386, 
380, 375, 370, 375, 360, 365, 350, 325, 300, or less; and the Y coordinate for point B can 
be 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 99 or more. The X coordinate for point C can be 
25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, 300, 350, or more; and the Y coordinate 

10 for point C can be 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 99 or more. The X coordinate for 
point D can be 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 75, 100, 200, 300, 350, or more; 
and the Y coordinate for point D can be 100, 99, 95, 90, 85, 80, 75, or less. In one 
embodiment, point A can be (386, 100), point B can be (386, 95), point C can be (350, 
95), and point D can be (350, 100). 

1 5 The invention also provides isolated nucleic acid that is at least about 12 bases in 

length (e.g., at least about 13, 14, 15, 16, 17, 18, 19, 20, 25, 30, 40, 50, 60, 100, 250, 500, 
750, 1000, 1500, 2000, 3000, 4000, or 5000 bases in length) and hybridizes, under 
hybridization conditions, to the sense or antisense strand of a nucleic acid having the 
sequence set forth in SEQ ID NO:l, 2, 37, 38, 40, 41, 95, or 96. The hybridization 

20 conditions can be moderately or highly stringent hybridization conditions. 

For the purpose of this invention, moderately stringent hybridization conditions 
mean the hybridization is performed at about 42°C in a hybridization solution containing 
25 mM KP0 4 (pH 7.4), 5X SSC, 5X Denhart's solution, 50 jig/mL denatured, sonicated 
salmon sperm DNA, 50% formamide, 10% Dextran sulfate, and 1-15 ng/mL probe (about 

25 5xl0 7 cpm/[ig), while the washes are performed at about 50°C with a wash solution 
containing 2X SSC and 0.1% sodium dodecyl sulfate. 

Highly stringent hybridization conditions mean the hybridization is performed at 
about 42°C in a hybridization solution containing 25 mM KPO4 (pH 7.4), 5X SSC, 5X 
Denhart's solution, 50 (ag/mL denatured, sonicated salmon sperm DNA, 50% formamide, 

30 1 0% Dextran sulfate, and 1-15 ng/mL probe (about 5x1 0 7 cpm/jig), while the washes are 
performed at about 65°C with a wash solution containing 0.2X SSC and 0.1% sodium 
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dodecyl sulfate. 

Isolated nucleic acid within the scope of the invention can be obtained using any 
method including, without limitation, common molecular cloning and chemical nucleic 
acid synthesis techniques. For example, PCR can be used to obtain an isolated nucleic 
5 acid containing a nucleic acid sequence sharing similarity to the sequence set forth in 
SEQ ID NO:l, 2, 37, 38, 40, 41, 95, or 96. PCR refers to a procedure or technique in 
which target nucleic acid is amplified in a manner similar to that described in U.S. Patent 
No. 4,683,195, and subsequent modifications of the procedure described therein. 
Generally, sequence information from the ends of the region of interest or beyond are 

10 used to design oligonucleotide primers that are identical or similar in sequence to 
opposite strands of a potential template to be amplified. Using PCR, a nucleic acid 
sequence can be amplified from RNA or DNA. For example, a nucleic acid sequence can 
be isolated by PCR amplification from total cellular RNA, total genomic DNA, and 
cDNA as well as from bacteriophage sequences, plasmid sequences, viral sequences, and 

15 the like. When using RNA as a source of template, reverse transcriptase can be used to 
synthesize complimentary DNA strands. 

An isolated nucleic acid within the scope of the invention also can be obtained by 
mutagenesis. For example, an isolated nucleic acid containing a sequence set forth in 
SEQ ID NO:l, 2, 37, 38, 40, 41, 95, or 96 can be mutated using common molecular 

20 cloning techniques (e.g., site-directed mutagenesis). Possible mutations include, without 
limitation, deletions, insertions, and substitutions, as well as combinations of deletions, 
insertions, and substitutions. 

In addition, nucleic acid and amino acid databases (e.g., GenBank®) can be used 
to obtain an isolated nucleic acid within the scope of the invention. For example, any 

25 nucleic acid sequence having some homology to a sequence set forth in SEQ ID NO:l, 2, 
37, 38, 40, 41, 95, or 96, or any amino acid sequence having some homology to a 
sequence set forth in SEQ ID NO:3, 39, 42, or 97 can be used as a query to search 
GenBank®. 

Further, nucleic acid hybridization techniques can be used to obtain an isolated 
30 nucleic acid within the scope of the invention. Briefly, any nucleic acid having some 
homology to a sequence set forth in SEQ ID NO:l, 2, 37, 38, 40, 41, 95, or 96 can be 
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used as a probe to identify a similar nucleic acid by hybridization under conditions of 
moderate to high stringency. Once identified, the nucleic acid then can be purified, 
sequenced, and analyzed to determine whether it is within the scope of the invention as 
described herein. 

5 Hybridization can be done by Southern or Northern analysis to identify a DNA or 

RNA sequence, respectively, that hybridizes to a probe. The probe can be labeled with a 
biotin, digoxygenin, an enzyme, or a radioisotope such as 32 P. The DNA or RNA to be 
analyzed can be electrophoretically separated on an agarose or polyacrylamide gel, 
transferred to nitrocellulose, nylon, or other suitable membrane, and hybridized with the 

1 0 probe using standard techniques well known in the art such as those described in sections 
7.39-7.52 of Sambrook et al, (1989) Molecular Cloning, second edition, Cold Spring 
harbor Laboratory, Plainview, NY. Typically, a probe is at least about 20 nucleotides in 
length. For example, a probe corresponding to a 20 nucleotide sequence set forth in SEQ 
ID NO:l, 2, 37, 38, 40, 41, 95, or 96 can be used to identify an identical or similar nucleic 

1 5 acid. In addition, probes longer or shorter than 20 nucleotides can be used. 

The invention provides isolated nucleic acid that contains the entire nucleic acid 
sequence depicted in Figure 2, 3, 7, 8, 10, 1 1, 28, or 29. In addition, the invention 
provides isolated nucleic acid that contains a portion of the nucleic acid sequence 
depicted in Figure 2, 3, 7, 8, 10, 1 1, 28, or 29. For example, the invention provides 

20 isolated nucleic acid that contains a 15 nucleotide sequence identical to any 15 nucleotide 
sequence depicted in Figure 2, 3, 7, 8, 10, 11, 28, or 29 including, without limitation, the 
sequence starting at nucleotide number 1 and ending at nucleotide number 15, the 
sequence starting at nucleotide number 2 and ending at nucleotide number 16, the 
sequence starting at nucleotide number 3 and ending at nucleotide number 17, and so 

25 forth. It will be appreciated that the invention also provides isolated nucleic acid that 

contains a nucleotide sequence that is greater than 15 nucleotides (e.g., 16, 17, 18, 19, 20, 
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, or more nucleotides) in length and identical to any 
portion of the sequence depicted in Figure 2, 3, 7, 8, 10, 1 1, 28, or 29. For example, the 
invention provides isolated nucleic acid that contains a 25 nucleotide sequence identical 

30 to any 25 nucleotide sequence depicted in Figure 2, 3, 7, 8, 10, 1 1, 28, or 29 including, 
without limitation, the sequence starting at nucleotide number 1 and ending at nucleotide 
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variations. For example, the STdxsdna sequence can contain one variation provided in 
Figure 5 or more than one (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, or more) of 
the variations provided in Figure 5. It is noted that the full-length nucleic acid sequences 
depicted in Figure 5 can encode polypeptides having DXS activity. It also is noted that 

5 the nucleic acid sequence depicted in Figure 2 contains the nucleic acid sequence 
depicted in Figure 3. 

Figure 13 depicts the nucleic acid sequence depicted in Figure 8 (designated 
RSddsdna) and the nucleic acid sequence depicted in Figure 1 1 (designated STddsdna) 
aligned with each other as well as aligned with three other nucleic acid sequences. 

10 Examples of variations of the RSddsdna sequence include, without limitation, any 

variation of the RSddsdna sequence provided in Figure 13. Examples of variations of the 
STddsdna sequence include, without limitation, any variation of the STddsdna sequence 
provided in Figure 13. Such variations are provided in Figure 13 in that a comparison of 
the nucleotide (or lack thereof) at a particular position of the RSddsdna sequence or the 

1 5 STddsdna sequence with the nucleotide (or lack thereof) at the same position of any of 
the other nucleic acid sequences depicted in Figure 13 provides a list of specific changes 
for the RSddsdna sequence and the STddsdna sequence. For example, the "a" at position 
5 1 1 of the RSddsdna sequence or the "a" at position 756 of the STddsdna sequence can 
be substituted with an cc t" as indicated in Figure 13. Again, it will be appreciated that the 

20 RSddsdna sequence as well as the STddsdna sequence can contain any number of 

variations as well as any combination of types of variations. For example, the RSddsdna 
sequence can contain one variation provided in Figure 13 or more than one (e.g., 2, 3, 4, 
5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, or more) of the variations provided in Figure 13. 
Likewise, the STddsdna sequence can contain one variation provided in Figure 13 or 

25 more than one (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, or more) of the variations 
provided in Figure 13. It is noted that the full-length nucleic acid sequences depicted in 
Figure 13 can encode polypeptides having DDS activity. It also is noted that the nucleic 
acid sequence depicted in Figure 7 contains the nucleic acid sequence depicted in Figure 
8 and that the nucleic acid sequence depicted in Figure 10 contains the nucleic acid 

30 sequence depicted in Figure 1 1 . 

The nucleic acid sequence depicted in Figure 7 contains a nucleic acid sequence 
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that encodes a R. sphaeroides (ATCC 17023) polypeptide having DDS activity. Another 
variant of this nucleic acid sequence is the nucleic acid sequence of a clone isolated from 
R. sphaeroides (ATCC 35053). Briefly, a H sphaeroides (ATCC 35053) clone was 
identified and found to contain a sequence identical to the nucleic acid sequence depicted 

5 in Figure 7 with the following three exceptions. The R. sphaeroides (ATCC 35053) clone 
has a ct t" at position 885 rather than a "c", a "c" inserted after the "c" at position 1620, 
and a "c" inserted after the V at position 1733. 

The nucleic acid depicted in Figure 8 also contains a nucleic acid sequence that 
encodes a R. sphaeroides (ATCC 17023) polypeptide having DDS activity. Another 

10 variant of this nucleic acid sequence is the nucleic acid sequence of a clone isolated from 
R. sphaeroides (ATCC 35053). Briefly, ai?. sphaeroides (ATCC 35053) clone was 
identified and found to contain a sequence identical to the nucleic acid sequence depicted 
in Figure 8 with the following exception. The R. sphaeroides (ATCC 35053) clone has a 
"t" at position 514 rather than a "c". 

1 5 Figure 3 1 depicts the nucleic acid sequence depicted in Figure 29 (designated 

Stdxrcds) aligned with eleven other nucleic acid sequences. Examples of variations of the 
Stdxrcds sequence include, without limitation, any variation of the Stdxrcds sequence 
provided in Figure 3 1 . Such variations are provided in Figure 3 1 in that a comparison of 
the nucleotide (or lack thereof) at a particular position of the Stdxrcds sequence with the 

20 nucleotide (or lack thereof) at the same position of any of the other nucleic acid 
sequences depicted in Figure 31 provides a list of specific changes for the Stdxrcds 
sequence. Again, it will be appreciated that the Stdxrcds sequence can contain any 
number of variations as well as any combination of types of variations. For example, the 
Stdxrcds sequence can contain one variation provided in Figure 3 1 or more than one (e.g., 

25 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, or more) of the variations provided in Figure 
31. It is noted that the full-length nucleic acid sequences depicted in Figure 3 1 can 
encode polypeptides having DXR activity. It also is noted that the nucleic acid sequence 
depicted in Figure 29 contains the nucleic acid sequence depicted in Figure 28. 

The invention also provides isolated nucleic acid that contains a variant of a 

30 portion of the nucleic acid sequence depicted in Figure 2, 3, 7, 8, 10, 1 1, 28, or 29 as 
described herein. 
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The invention provides isolated nucleic acid that contains a nucleic acid sequence 
that encodes the entire amino acid sequence depicted in Figure 4, 9, 12, or 30. In 
addition, the invention provides isolated nucleic acid that contains a nucleic acid 
sequence that encodes a portion of the amino acid sequence depicted in Figure 4, 9, 12, or 
5 30. For example, the invention provides isolated nucleic acid that contains a nucleic acid 
sequence that encodes a 15 amino acid sequence identical to any 15 amino acid sequence 
depicted in Figure 4, 9, 12, or 30 including, without limitation, the sequence starting at 
amino acid residue number 1 and ending at amino acid residue number 15, the sequence 
starting at amino acid residue number 2 and ending at amino acid residue number 16, the 

10 sequence starting at amino acid residue number 3 and ending at amino acid residue 

number 17, and so forth. It will be appreciated that the invention also provides isolated 
nucleic acid that contains a nucleic acid sequence that encodes an amino acid sequence 
that is greater than 15 amino acid residues (e.g., 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 
27, 28, 29, 30, or more amino acid residues) in length and identical to any portion of the 

15 sequence depicted in Figure 4, 9, 12, or 30. For example, the invention provides isolated 
nucleic acid that contains a nucleic acid sequence that encodes a 25 amino acid sequence 
identical to any 25 amino acid sequence depicted in Figure 4, 9, 12, or 30 including, 
without limitation, the sequence starting at amino acid residue number 1 and ending at 
amino acid residue number 25, the sequence starting at amino acid residue number 2 and 

20 ending at amino acid residue number 26, the sequence starting at amino acid residue 
number 3 and ending at amino acid residue number 27, and so forth. Additional 
examples include, without limitation, isolated nucleic acids that contain a nucleic acid 
sequence that encodes an amino acid sequence that is 50 or more amino acid residues 
(e.g., 100, 150, 200, 250, 300, 350, or more amino acid residues) in length and identical 

25 to any portion of the sequence depicted in Figure 4, 9, 12, or 30. Such isolated nucleic 
acids can include, without limitation, those isolated nucleic acids containing a nucleic 
acid sequence that encodes an amino acid sequence represented in a single line of 
sequence depicted in Figure 4, 9, 12, or 30 since each line of sequence depicted in these 
figures, with the exception of the last line, provides a 50 amino acid sequence. 

30 In addition, the invention provides isolated nucleic acid that contains a nucleic 

acid sequence that encodes an amino acid sequence having a variation of the amino acid 
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sequence depicted in Figure 4, 9, 12, or 30. For example, the invention provides isolated 
nucleic acid containing a nucleic acid sequence encoding an amino acid sequence 
depicted in Figure 4, 9, 12, or 30 that contains a single insertion, a single deletion, a 
single substitution, multiple insertions, multiple deletions, multiple substitutions, or any 

5 combination thereof (e.g., single deletion together with multiple insertions). The 

invention provides multiple examples of isolated nucleic acid containing a nucleic acid 
sequence encoding an amino acid sequence having a variation of an amino acid sequence 
depicted in Figure 4, 9, 12, or 30. 

Figure 6 depicts the amino acid sequence depicted in Figure 4 (designated 

1 0 STdxsp) aligned with 20 other amino acid sequences. Examples of variations of the 
STdxsp sequence include, without limitation, any variation of the STdxsp sequence 
provided in Figure 6. Such variations are provided in Figure 6 in that a comparison of the 
amino acid residue (or lack thereof) at a particular position of the STdxsp sequence with 
the amino acid residue (or lack thereof) at the same position of any of the other 20 amino 

1 5 acid sequences depicted in Figure 6 provides a list of specific changes for the STdxsp 
sequence. For example, the ct t" at position 1 148 of the STdxsp sequence can be 
substituted with an "s" as indicated in Figure 6. As also indicated in Figure 6, the "f ' at 
position 575 of the STdxsp sequence can be substituted with an "m", "a", "1", 'T\ "y", or 
"v". For Figure 6, the nucleic acid numbering of Figure 2 is used to number the amino 

20 acid residue positions of the STdxsp sequence. Thus, the first amino acid residue of the 
STdxsp sequence starts with number 1 82 and proceeds in increments of three. It will be 
appreciated that the STdxsp sequence can contain any number of variations as well as any 
combination of types of variations. For example, the STdxsp sequence can contain one 
variation provided in Figure 6 or more than one (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 

25 50, 100, or more) of the variations provided in Figure 6. It is noted that the 21 full-length 
amino acid sequences depicted in Figure 6 can be polypeptides having DXS activity. 

Figure 14 depicts the amino acid sequence depicted in Figure 9 (designated 
RSddsp) and the amino acid sequence depicted in Figure 12 (designated STddsp) aligned 
with each other as well as aligned with three other amino acid sequences. For Figure 14, 

30 the nucleic acid numbering of Figure 7 is used to number the amino acid residue positions 
of the RSddsp sequence, and the nucleic acid numbering of Figure 10 is used to number 
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the amino acid residue positions of the STddsp sequence. Thus, the first amino acid 
residue of the RSddsp and STddsp sequences each start with a number other than 1 and 
proceed in increments of three. Examples of variations of the RSddsp sequence include, 
without limitation, any variation of the RSddsp sequence provided in Figure 14. 

5 Examples of variations of the STddsp sequence include, without limitation, any variation 
of the STddsp sequence provided in Figure 14. Such variations are provided in Figure 14 
in that a comparison of the amino acid residue (or lack thereof) at a particular position of 
the RSddsp sequence or the STddsp sequence with the amino acid residue (or lack 
thereof) at the same position of any of the other amino acid sequences depicted in Figure 

10 14 provides a list of specific changes for the RSddsp sequence and the STddsp sequence. 
For example, the "1" at position 762 of the RSddsp sequence or the "1" at position 1007 of 
the STddsp sequence can be substituted with an "a" as indicated in Figure 14. Again, it 
will be appreciated that the RSddsp sequence as well as the STddsp sequence can contain 
any number of variations as well as any combination of types of variations. For example, 

15 the RSddsp sequence can contain one variation provided in Figure 14 or more than one 
(e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, or more) of the variations provided in 
Figure 14. Likewise, the STddsp sequence can contain one variation provided in Figure 
14 or more than one (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, or more) of the 
variations provided in Figure 14. It is noted that the five full-length amino acid sequences 

20 depicted in Figure 14 can be polypeptides having DDS activity. 

The amino acid sequence depicted in Figure 9 represents a R. sphaeroides (ATCC 
17023) polypeptide having DDS activity. Another variant of this amino acid sequence is 
the amino acid sequence encoded by a clone isolated from R sphaeroides (ATCC 35053). 
Briefly, a R. sphaeroides (ATCC 35053) clone was identified and found to encode an 

25 amino acid sequence identical to the amino acid sequence depicted in Figure 9 with the 
following exception. The & sphaeroides (ATCC 35053) clone has a "y" at position 172 
rather than an 6c h". 

Figure 32 depicts the amino acid sequence depicted in Figure 30 (designated 
Stdxrp) aligned with 15 other amino acid sequences. Examples of variations of the 
30 Stdxrp sequence include, without limitation, any variation of the Stdxrp sequence 

provided in Figure 32. Such variations are provided in Figure 32 in that a comparison of 
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the amino acid residue (or lack thereof) at a particular position of the Stdxrp sequence 
with the amino acid residue (or lack thereof) at the same position of any of the other 15 
amino acid sequences depicted in Figure 32 provides a list of specific changes for the 
Stdxrp sequence. It will be appreciated that the Stdxrp sequence can contain any number 
5 of variations as well as any combination of types of variations. For example, the Stdxrp 
sequence can contain one variation provided in Figure 32 or more than one (e.g., 2, 3, 4, 
5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, or more) of the variations provided in Figure 32. It is 
noted that the full-length amino acid sequences depicted in Figure 32 can be polypeptides 
having DXR activity. 

10 The invention also provides isolated nucleic acid containing a nucleic acid 

sequence encoding an amino acid sequence that contains a variant of a portion of the 
amino acid sequence depicted in Figure 4, 9, 12, or 30 as described herein. 

2. Polypeptides 

15 The invention provides substantially pure polypeptides. The term "substantially 

pure" as used herein with reference to a polypeptide means the polypeptide is 
substantially free of other polypeptides, lipids, carbohydrates, and nucleic acid with 
which it is naturally associated. Thus, a substantially pure polypeptide is any polypeptide 
that is removed from its natural environment and is at least 60 percent pure. A 

20 substantially pure polypeptide can be at least about 65, 70, 75, 80, 85, 90, 95, or 99 

percent pure. Typically, a substantially pure polypeptide will yield a single major band 
on a non-reducing polyacrylamide gel. 

Any substantially pure polypeptide having an amino acid sequence encoded by a 
nucleic acid within the scope of the invention is itself within the scope of the invention. 

25 In addition, any substantially pure polypeptide containing an amino acid sequence having 
a length and a percent identity to the sequence set forth in SEQ ID NO:3 over that length 
as determined herein is within the scope of the invention provided the point defined by 
that length and percent identity is within the area defined by points A, B, C, and D of 
Figure 26; where point A has an X coordinate less than or equal to 641, and a Y 

30 coordinate less than or equal to 100; where point B has an X coordinate less than or equal 
to 641, and a Y coordinate greater than or equal to 50; where point C has an X coordinate 
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greater than or equal to 25, and a Y coordinate greater than or equal to 50; and where 
point D has an X coordinate greater than or equal to 5, and a Y coordinate less than or 
equal to 100. For example, the X coordinate for point A can be 641, 635, 630, 625, 620, 
or less; and the Y coordinate for point A can be 100, 99, 95, 90, 85, 80, 75, or less. The X 
5 coordinate for point B can be 641, 635, 630, 625, 620, or less; and the Y coordinate for 
point B can be 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 99 or more. The X coordinate for 
point C can be 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, or more; and the Y 
coordinate for point C can be 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 99 or more. The X 
coordinate for point D can be 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 75, 100, or more; and 

10 the Y coordinate for point D can be 100, 99, 95, 90, 85, 80, 75, or less. In one 

embodiment, point A can be (641, 100), point B can be (641, 95), point C can be (400, 
95), and point D can be (400, 100), 

Any substantially pure polypeptide containing an amino acid sequence having a 
length and a percent identity to the sequence set forth in SEQ ID NO:39 over that length 

15 as determined herein is within the scope of the invention provided the point defined by 
that length and percent identity is within the area defined by points A, B, C, and D of 
Figure 26; where point A has an X coordinate less than or equal to 333, and a Y 
coordinate less than or equal to 100; where point B has an X coordinate less than or equal 
to 333, and a Y coordinate greater than or equal to 50; where point C has an X coordinate 

20 greater than or equal to 25, and a Y coordinate greater than or equal to 50; and where 
point D has an X coordinate greater than or equal to 5, and a Y coordinate less than or 
equal to 100. For example, the X coordinate for point A can be 333, 330, 325, 320, 315, 
or less; and the Y coordinate for point A can be 100, 99, 95, 90, 85, 80, 75, or less. The X 
coordinate for point B can be 333, 330, 325, 320, 315, or less; and the Y coordinate for 

25 point B can be 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 99 or more. The X coordinate for 
point C can be 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, or more; and the Y 
coordinate for point C can be 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 99 or more. The X 
coordinate for point D can be 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 75, 100, or more; and 
the Y coordinate for point D can be 100, 99, 95, 90, 85, 80, 75, or less. In one 

30 embodiment, point A can be (333, 100), point B can be (333, 95), point C can be (150, 
95), and point D can be (150, 100). 
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Any substantially pure polypeptide containing an amino acid sequence having a 
length and a percent identity to the sequence set forth in SEQ ID NO:42 over that length 
as determined herein is within the scope of the invention provided the point defined by 
that length and percent identity is within the area defined by points A, B, C, and D of 
5 Figure 26; where point A has an X coordinate less than or equal to 337, and a Y 

coordinate less than or equal to 100; where point B has an X coordinate less than or equal 
to 337, and a Y coordinate greater than or equal to 50; where point C has an X coordinate 
greater than or equal to 25, and a Y coordinate greater than or equal to 50; and where 
point D has an X coordinate greater than or equal to 5, and a Y coordinate less than or 

10 equal to 100. For example, the X coordinate for point A can be 337, 335, 330, 325, 320, 
315, or less; and the Y coordinate for point A can be 100, 99, 95, 90, 85, 80, 75, or less. 
The X coordinate for point B can be 337, 335, 330, 325, 320, 3 1 5, or less; and the Y 
coordinate for point B can be 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 99 or more. The X 
coordinate for point C can be 25, 30, 35, 40, 50, 60, 70, 80, 90, 100, 150, 200, or more; 

15 and the Y coordinate for point C can be 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 99 or more. 
The X coordinate for point D can be 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, 40, 50, 75, 100, or 
more; and the Y coordinate for point D can be 100, 99, 95, 90, 85, 80, 75, or less. In one 
embodiment, point A can be (337, 100), point B can be (337, 95), point C can be (150, 
95), and point D can be (150, 100). 

20 Any substantially pure polypeptide containing an amino acid sequence having a 

length and a percent identity to the sequence set forth in SEQ ID NO:97 over that length 
as determined herein is within the scope of the invention provided the point defined by 
that length and percent identity is within the area defined by points A, B, C, and D of 
Figure 26; where point A has an X coordinate less than or equal to 386, and a Y 

25 coordinate less than or equal to 1 00; where point B has an X coordinate less than or equal 
to 386, and a Y coordinate greater than or equal to 50; where point C has an X coordinate 
greater than or equal to 25, and a Y coordinate greater than or equal to 50; and where 
point D has an X coordinate greater than or equal to 5, and a Y coordinate less than or 
equal to 100. For example, the X coordinate for point A can be 386, 380, 375, 370, 375, 

30 360, 365, 350, 325, 300, or less; and the Y coordinate for point A can be 100, 99, 95, 90, 
85, 80, 75, or less. The X coordinate for point B can be 386, 380, 375, 370, 375, 360, 
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365, 350, 325, 300, or less; and the Y coordinate for point B can be 50, 55, 60, 65, 70, 75, 
80, 85, 90, 95, 99 or more. The X coordinate for point C can be 25, 30, 35, 40, 50, 60, 70, 
80, 90, 100, 150, 200, 300, 350, or more; and the Y coordinate for point C can be 50, 55, 
60, 65, 70, 75, 80, 85, 90, 95, 99 or more. The X coordinate for point D can be 5, 6, 7, 8, 
5 9, 10, 15, 20, 25, 30, 40, 50, 75, 100, 200, 300, 350, or more; and the Y coordinate for 
point D can be 100, 99, 95, 90, 85, 80, 75, or less. In one embodiment, point A can be 
(386, 100), point B can be (386, 95), point C can be (350, 95), and point D can be (350, 
100). 

Any method can be used to obtain a substantially pure polypeptide. For example, 

10 common polypeptide purification techniques such as affinity chromotography and HPLC 
as well as polypeptide synthesis techniques can be used. In addition, any material can be 
used as a source to obtain a substantially pure polypeptide. For example, tissue from 
wild-type or transgenic animals can be used as a source material. In addition, tissue 
culture cells engineered to over-express a particular polypeptide of interest can be used to 

1 5 obtain substantially pure polypeptide. Further, a polypeptide within the scope of the 
invention can be "engineered" to contain an amino acid sequence that allows the 
polypeptide to be captured onto an affinity matrix. For example, a tag such as c-myc, 
hemagglutinin, polyhistidine, or Flag™ tag (Kodak) can be used to aid polypeptide 
purification. Such tags can be inserted anywhere within the polypeptide including at 

20 either the carboxyl or amino termini. Other fusions that could be useful include enzymes 
that aid in the detection of the polypeptide, such as alkaline phosphatase. 

The invention provides polypeptides that contain the entire amino acid sequence 
depicted in Figure 4, 9, 12, or 30. In addition, the invention provides polypeptides that 
contain a portion of the amino acid sequence depicted in Figure 4, 9, 12, or 30. For 

25 example, the invention provides polypeptides that contain a 15 amino acid sequence 
identical to any 15 amino acid sequence depicted in Figure 4, 9, 12, or 30 including, 
without limitation, the sequence starting at amino acid residue number 1 and ending at 
amino acid residue number 15, the sequence starting at amino acid residue number 2 and 
ending at amino acid residue number 16, the sequence starting at amino acid residue 

30 number 3 and ending at amino acid residue number 1 7, and so forth. It will be 

appreciated that the invention also provides polypeptides that contain an amino acid 
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sequence that is greater than 15 amino acid residues (e.g., 16, 17, 18, 19, 20, 21, 22, 23, 
24, 25, 26, 27, 28, 29, 30, or more amino acid residues) in length and identical to any 
portion of the sequence depicted in Figure 4, 9, 12, or 30. For example, the invention 
provides polypeptides that contain a 25 amino acid sequence identical to any 25 amino 

5 acid sequence depicted in Figure 4, 9, 12, or 30 including, without limitation, the 
sequence starting at amino acid residue number 1 and ending at amino acid residue 
number 25, the sequence starting at amino acid residue number 2 and ending at amino 
acid residue number 26, the sequence starting at amino acid residue number 3 and ending 
at amino acid residue number 27, and so forth. Additional examples include, without 

1 0 limitation, polypeptides that contain an amino acid sequence that is 50 or more amino 
acid residues (e.g., 100, 150, 200, 250, 300, 350, or more amino acid residues) in length 
and identical to any portion of the sequence depicted in Figure 4, 9, 12, or 30. Such 
polypeptides can include, without limitation, those polypeptides containing a amino acid 
sequence represented in a single line of sequence depicted in Figure 4, 9, 12, or 30 since 

1 5 each line of sequence depicted in these figures, with the possible exception of the last 
line, provides a 50 amino acid sequence. 

In addition, the invention provides polypeptides that an amino acid sequence 
having a variation of the amino acid sequence depicted in Figure 4, 9, 12, or 30. For 
example, the invention provides polypeptides containing an amino acid sequence depicted 

20 in Figure 4, 9, 12, or 30 that contains a single insertion, a single deletion, a single 
substitution, multiple insertions, multiple deletions, multiple substitutions, or any 
combination thereof (e.g., single deletion together with multiple insertions). The 
invention provides multiple examples of polypeptides containing an amino acid sequence 
having a variation of an amino acid sequence depicted in Figure 4, 9, 12, or 30. 

25 Figure 6 depicts the amino acid sequence depicted in Figure 4 (designated 

STdxsp) aligned with 20 other amino acid sequences. Examples of variations of the 
STdxsp sequence include, without limitation, any variation of the STdxsp sequence 
provided in Figure 6. Such variations are provided in Figure 6 in that a comparison of the 
amino acid residue (or lack thereof) at a particular position of the STdxsp sequence with 

30 the amino acid residue (or lack thereof) at the same position of any of the other 20 amino 
acid sequences depicted in Figure 6 provides a list of specific changes for the STdxsp 
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sequence. For example, the "t" at position 1 148 of the STdxsp sequence can be 
substituted with an "s" as indicated in Figure 6. As also indicated in Figure 6, the "f" at 
position 575 of the STdxsp sequence can be substituted with an "m", "a", "1", "i", "y"> or 
"v" For Figure 6, the nucleic acid numbering of Figure 2 is used to number the amino 
5 acid residue positions of the STdxsp sequence. Thus, the first amino acid residue of the 
STdxsp sequence starts with number 1 82 and proceeds in increments of three. It will be 
appreciated that the STdxsp sequence can contain any number of variations as well as any 
combination of types of variations. For example, the STdxsp sequence can contain one 
variation provided in Figure 6 or more than one (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 

10 50, 1 00, or more) of the variations provided in Figure 6. It is noted that the 21 full-length 
amino acid sequences depicted in Figure 6 can be polypeptides having DXS activity. 

Figure 14 depicts the amino acid sequence depicted in Figure 9 (designated 
RSddsp) and the amino acid sequence depicted in Figure 12 (designated STddsp) aligned 
with each other as well as aligned with three other amino acid sequences. For Figure 14, 

15 the nucleic acid numbering of Figure 7 is used to number the amino acid residue positions 
of the RSddsp sequence, and the nucleic acid numbering of Figure 10 is used to number 
the amino acid residue positions of the STddsp sequence. Thus, the first amino acid 
residue of the RSddsp and STddsp sequences each start with a number other than 1 and 
proceed in increments of three. Examples of variations of the RSddsp sequence include, 

20 without limitation, any variation of the RSddsp sequence provided in Figure 14. 

Examples of variations of the STddsp sequence include, without limitation, any variation 
of the STddsp sequence provided in Figure 14. Such variations are provided in Figure 14 
in that a comparison of the amino acid residue (or lack thereof) at a particular position of 
the RSddsp sequence or the STddsp sequence with the amino acid residue (or lack 

25 thereof) at the same position of any of the other amino acid sequences depicted in Figure 
14 provides a list of specific changes for the RSddsp sequence and the STddsp sequence. 
For example, the "1" at position 762 of the RSddsp sequence or the "1" at position 1007 of 
the STddsp sequence can be substituted with an "a" as indicated in Figure 14. Again, it 
will be appreciated that the RSddsp sequence as well as the STddsp sequence can contain 

30 any number of variations as well as any combination of types of variations. For example, 
the RSddsp sequence can contain one variation provided in Figure 14 or more than one 
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(e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, or more) of the variations provided in 
Figure 14. Likewise, the STddsp sequence can contain one variation provided in Figure 
14 or more than one (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, or more) of the 
variations provided in Figure 14. It is noted that the five full-length amino acid sequences 
5 depicted in Figure 14 can be polypeptides having DDS activity. 

Figure 32 depicts the amino acid sequence depicted in Figure 30 (designated 
Stdxrp) aligned with 15 other amino acid sequences. Examples of variations of the 
Stdxrp sequence include, without limitation, any variation of the Stdxrp sequence 
provided in Figure 32. Such variations are provided in Figure 32 in that a comparison of 

10 the amino acid residue (or lack thereof) at a particular position of the Stdxrp sequence 
with the amino acid residue (or lack thereof) at the same position of any of the other 15 
amino acid sequences depicted in Figure 32 provides a list of specific changes for the 
Stdxrp sequence. It will be appreciated that the Stdxrp sequence can contain any number 
of variations as well as any combination of types of variations. For example, the Stdxrp 

15 sequence can contain one variation provided in Figure 32 or more than one (e.g., 2, 3, 4, 
5, 6, 7, 8, 9, 10, 15, 20, 25, 50, 100, or more) of the variations provided in Figure 32. It is 
noted that the full-length amino acid sequences depicted in Figure 32 can be polypeptides 
having DXR activity. 

The invention also provides polypeptides containing an amino acid sequence that 

20 contains a variant of a portion of the amino acid sequence depicted in Figure 4, 9, 12, or 
30 as described herein. 

3. Genetically modified cells 

Any cell containing an isolated nucleic acid within the scope of the invention is 

25 itself within the scope of the invention. This includes, without limitation, prokaryotic 
cells such as cells from the Rhodospirillaceae family (e.g., Rhodobacter cells) and 
eukaryotic cells such as plant and mammalian cells. It is noted that cells containing an 
isolated nucleic acid of the invention are not required to express the isolated nucleic acid. 
In addition, the isolated nucleic acid can be integrated into the genome of the cell or 

30 maintained in an episomal state. In other words, cells can be stably or transiently 
transformed with an isolated nucleic acid of the invention. 
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Any method can be used to introduce an isolated nucleic acid into a cell. In fact, 
many methods for introducing nucleic acid into a cell, whether in vivo or in vitro, are well 
known to those skilled in the art. For example, calcium phosphate precipitation, 
electroporation, heat shock, lipofection, microinjection, conjugation, and viral-mediated 
5 nucleic acid transfer are common methods that can be used to introduce nucleic acid into 
a cell. In addition, naked DNA can be delivered directly to cells in vivo as describe 
elsewhere (U.S. Patent Number 5,580,859 and U.S. Patent Number 5,589,466 including 
continuations thereof). Further, nucleic acid can be introduced into cells by generating 
transgenic animals. 

10 Any method can be used to identify cells that contain an isolated nucleic acid 

within the scope of the invention. For example, PGR and nucleic acid hybridization 
techniques such as Northern and Southern analysis can be used. In some cases, 
immunohistochemistry and biochemical techniques can be used to determine if a cell 
contains a particular nucleic acid by detecting the expression of a polypeptide encoded by 

1 5 that particular nucleic acid. For example, detection of polypeptide X-immunoreactivity 
after introduction of an isolated nucleic acid containing a cDNA that encodes polypeptide 
X into a cell that does not normally express polypeptide X can indicate that that cell not 
only contains the introduced nucleic acid but also expresses the encoded polypeptide X 
from that introduced nucleic acid. In this case, the detection of any enzymatic activities 

20 of polypeptide X also can indicate that that cell contains the introduced nucleic acid and 
expresses the encoded polypeptide X from that introduced nucleic acid. 

Any method can be used to direct the expression of an amino acid sequence from 
a nucleic acid. Such methods are well known to those skilled in the art, and include, 
without limitation, constructing a nucleic acid such that a regulatory element drives the 

25 expression of a nucleic acid sequence that encodes a polypeptide. Typically, regulatory 
elements are DNA sequences that regulate the expression of other DNA sequences at the 
level of transcription. Such regulatory elements include, without limitation, promoters, 
enhancers, and the like. In addition, any method for expressing a polypeptide from an 
exogenous nucleic acid molecule in microorganisms such as bacteria and yeast can be 

30 used. For example, well-known methods for making and using nucleic acid constructs 
that are capable of expressing exogenous polypeptides within Rhodobacter species (e.g., 
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R sphaeroides and R. capsulatus) can be used. See, e.g., Dryden and Dowhan, J. 
Bacterial, 178(4):1030-1038 (1996); Vasilyeva et aU Applied Biochemistry and 
Biotechnology, 77-79:337-345 (1999); Graichen et al, 1 Bacteriol, 181(14):42 16-4222 
(1999); Johnson et al,J. Bacteriol, 167(2):604-610 (1986); and Duport et al, Gene, 

5 145:103-108 (1994). Further, any methods can be used to identify cells that express an 
amino acid sequence from a nucleic acid. Such methods are well known to those skilled 
in the art, and include, without limitation, immunocytochemistry, Western analysis, 
Northern analysis, and RT-PCR. 

The cells described herein can contain a single copy, or multiple copies (e.g., 

1 0 about 5, 1 0, 20, 35, 50, 75, 1 00 or 1 50 copies), of a particular exogenous nucleic acid. 

For example, a bacterial cell can contain about 50 copies of exogenous nucleic acid X. In 
addition, the cells described herein can contain more than one particular exogenous 
nucleic acid. For example, a bacterial cell can contain about 50 copies of exogenous 
nucleic acid X as well as about 75 copies of exogenous nucleic acid Y. In these cases, 

1 5 each different nucleic acid can encode a different polypeptide having its own unique 
enzymatic activity. For example, a bacterial cell can contain two different exogenous 
nucleic acids such that a high level of CoQ(10) is produced. In this example, such a cell 
can contain a first exogenous nucleic acid that encodes a polypeptide having DXS activity 
and a second exogenous nucleic acid that encodes a polypeptide having DDS activity. In 

20 addition, a single exogenous nucleic acid can encode one or more than one polypeptide. 
For example, a single nucleic acid can contain sequences that encode three different 
polypeptides. 

In addition to providing cells that contain an isolated nucleic acid of the invention, 
the invention provides cells (e.g., plant cells, animal cells, and microorganisms) that can 
25 be used to produce an isoprenoid compound such as CoQ(10). The term 

"microorganism" as used herein refers to all microscopic organisms including, without 
limitation, bacteria, algae, fungi, and protozoa. It is noted that bacteria cells can be 
membraneous bacteria or non-membraneous bacteria. 

The term "non-membraneous bacteria" as used herein refers to any bacteria 
30 lacking intracytoplasmic membrane. The term "membraneous bacteria" as used herein 
refers to any naturally-occurring, genetically modified, or environmentally modified 
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bacteria having an intracytoplasmic membrane. An intracytoplasmic membrane can be 
organized in a variety of ways including, without limitation, vesicles, tubules, thylakoid- 
like membrane sacs, and highly organized membrane stacks. Any method can be used to 
analyze bacteria for the presence of intracytoplasmic membranes including, without 
5 limitation, electron microscopy, light microscopy, and density gradients. See, e.g., Chory 
et al, J. BacterioL, 159:540-554 (1984); Niederman and Gibson, Isolation and 
Physiochemical Properties of Membranes from Purple Photosynthetic Bacteria. In: The 
Photosynthetic Bacteria, Ed. By Roderick K. Clayton and William R. Sistrom, Plenum 
Press, pp. 79-118 (1978); and Luekingef a/., I Biol Chem., 253: 451-457 (1978). 

1 0 Examples of membraneous bacteria that can be used herein include, without limitation, 
bacteria of the Rhodospirillaceae family such as those in the genus Rhodobacter (e.g., R. 
sphaeroides, R. capsulatus, R. sulfidophilus, R. adriaticus, and R, veldkampii), the genus 
Rhodospirillum (e.g., R. rubrum, R. photometricum, R. rnolischianum, R.fulvum, and R. 
salinarum), the genus Rhodopseudomonas (e.g., R. palustris, R. viridis, and R. 

15 sulfoviridisX the genus Rhodomicrobium, the genus Rhodocyclus, and the genus 

Rhodopila; bacteria of the Chromatiaceae family such as those in the genus Chromatium, 
genus Thiocystis, the genus Thiospirillum, the genus Thiocapsa, the genus Lamprobacter, 
the genus Lalmprocystis, the genus Thiodictyon, the genus Amoebobacter, and the genus 
Thiopedia; green sulfur bacteria such as those in the genus Chlorobium and the genus 

20 Prosthecochloris; bacteria of the Methylococcaceae family such as those in the genus 
Methylococcus (e.g., M capsulatus), and the genus Methylomonas (e.g., M methanica); 
and particular bacteria of the Nitrobacteraceae family such as those in the genus 
Nitrobacter (e.g., N. winogradsky and N. hamburgensis), the genus Nitrococcus (e.g., K 
mobilis), and the genus Nitrosomonas (e.g., N. europaea). 

25 Membraneous bacteria can be highly membraneous bacteria. The term "highly 

membraneous bacteria" as used herein refers to any bacterium having more 
hitracytoplasmic membrane than R, sphaeroides (ATCC 17023) cells have after the R. 
sphaeroides (ATCC 17023) cells have been (1) cultured chemoheterotrophically under 
aerobic conditions for four days, (2) cultured chemoheterotrophically under oxygen- 

30 limited conditions for four hours, and (3) harvested. The aerobic culture conditions 

involve culturing the cells in the dark at 30°C in the presence of 25 percent oxygen. The 
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oxygen-limited conditions involve culturing the cells in the light at 30°C in the presence 
of 2 percent oxygen. After the four hour culturing step under oxygen-limited conditions, 
the R. sphaeroides (ATCC 17023) cells are harvested by centrifugation and analyzed. 

Typically, any cell (e.g., membraneous bacteria) can be genetically modified such 

5 that a particular isoprenoid compound is produced. Such cells can contain exogenous 
nucleic acid that encodes a polypeptide having enzymatic activity. For example, a 
microorganism having endogenous DDS activity can be transformed with an exogenous 
nucleic acid that encodes a polypeptide having DDS activity. In this case, the 
microorganism can have increased DDS activity which can lead to an increased 

1 0 production of CoQ(l 0). Thus, a cell can be given an exogenous nucleic acid that encodes 
a polypeptide having an enzymatic activity that catalyzes the production of a compound 
normally produced by that cell. In this case, the genetically modified cell can produce 
more of the compound, or can produce the compound more efficiently, than a similar cell 
not having the genetic modification. Alternatively, a cell can be given an exogenous 

1 5 nucleic acid that encodes a polypeptide having an enzymatic activity that catalyzes the 
production of a compound that is not normally produced by that cell. 

The invention provides cells containing exogenous nucleic acid that encodes a 
polypeptide having enzymatic activity that leads to an increased production of CoQ(lO). 
Such cells can contain nucleic acid that encodes a polypeptide having DDS activity. 

20 Other examples include, without limitation, cells containing exogenous nucleic acid that 
encodes polypeptides having DXS, ODS, SDS, DXR, 4-diphosphocytidyl-2C-methyl-D- 
erythritol synthase (e.g., ispD), 4-diphosphocytidyl-2C-methyl-D-erythritol kinase (e.g., 
ispE), and/or chorismate lyase (e.g., ubiC) activity. Nucleic acid molecules that encode 
polypeptides having such enzymatic activities can be obtained as described herein. For 

25 example, nucleic acid encoding a polypeptide having chorismate lyase can be cloned 
using the sequence information provided in Genbank® accession number X66619. 

Typically, microorganisms of the invention produce CoQ(lO) with the yield (mg 
of CoQ(lO) per g of dry biomass) being at least about 5 (e.g., at least about 6, 7, 8, 9, 10, 
11,12, 13, 14, 15, 20, 25, 30, 35, or more) percent greater than that of a comparable wild- 

30 type strain grown under similar conditions. Bacteria can produce more CoQ(10) when 
grown under anaerobic conditions as compared to aerobic conditions. For example, 
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anaerobically cultured bacteria can produce about 3 to 4 fold more CoQ(lO) than 
aerobically cultured bacteria of the same species. When determining the yield of 
isoprenoid compound production for a particular cell (e.g., microorganism), any method 
can be used. See, e.g, , Cohen-Bazire et al , J. Cell Comp. Physiol , 49:25-68 (1 957); 
5 Edlund, J. Chromatogr., 425:87-97 (1988); Rousseau and Varin, J, Chromatogr. Scl s 
36:247-52 (1998); and Leray et al 9 J. Lipid Res., 39:2099-2105 (1998). 

The invention provides a cell containing an exogenous nucleic acid that encodes a 
polypeptide having DXS, DDS, ODS, SDS, DXR, 4-diphosphocytidyl-2C-methyl-D- 
erythritol synthase (e.g., ispD), 4-diphosphocytidyl-2C-methyl-D-erythritol kinase (e.g., 

10 ispE), and/or chorismate lyase (e.g., ubiC) activity. Nucleic acid molecules that encode 
polypeptides having such enzymatic activities can be obtained as described herein. The 
invention also provides a cell that contains more than one different exogenous nucleic 
acid molecule with each different exogenous nucleic acid molecule encoding a 
polypeptide having a different one of the following enzymatic activities: DXS, DDS, 

1 5 ODS, SDS, DXR, 4-diphosphocytidyl-2C-methyl-D«erythritol synthase (e.g., ispD), 4- 
diphosphocytidyl-2C«methyl-D-erythritol kinase (e.g., ispE), and/or chorismate lyase 
(e.g., ubiC) activity. For example, the invention provides a cell containing a first 
exogenous nucleic acid encoding a polypeptide having DXS activity and a second 
exogenous nucleic acid encoding a polypeptide having DDS activity. 

20 The invention provides a cell containing an exogenous nucleic acid containing a 

dxs sequence (e.g., Stdxs sequence), dds sequence (e.g., Stdds or Rsdds sequence), dxr 
sequence (e.g., Stdxr sequence), ubiC sequence (e.g., EcUbiC sequence), or lytB 
sequence (e.g., RsLytB sequence). Such nucleic acids can be obtained as described 
herein. The invention also provides a cell that contains more than one of the following 

25 sequences: a dxs sequence (e.g., Stdxs sequence), dds sequence (e.g., Stdds or Rsdds 

sequence), dxr sequence (e.g., Stdxr sequence), ubiC sequence (e.g., EcUbiC sequence), 
or lytB sequence (e.g., RsLytB sequence). For example, the invention provides a cell 
containing a first exogenous nucleic acid containing a dds sequence and a second 
exogenous nucleic acid containing a dxs sequence. Likewise, the invention provides a 

30 cell containing a single exogenous nucleic acid that contains a dds sequence and a dxs 
sequence. 
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Typically, a microorganism within the scope of the invention catabolizes a hexose 
carbon such as glucose. A microorganism, however, can catabolize a pentose carbon 
(e.g., ribose, arabinose, xylose, and lyxose). In other words, a microorganism within the 
scope of the invention can either utilize hexose or pentose carbon. In addition, a 
5 microorganism within the scope of the invention can use carbon sources such as methanol 
and/or organic acids (e.g., succinic acid or malic acid). 

Any cells described herein can have reduced enzymatic activity such as reduced 
geranylgeranyl pyrophosphate synthase and/or magnesium protoporphyrin IX chelatase 
activity. Any cell described herein can have reduced biological activity such as reduced 

10 activity of aerobic repressor polypeptides (e.g., PPSR) or oxidation-reduction sensor 
polypeptides (e.g., CBB3). In the case of multi-subunit molecules such as CBB3, the 
activity of the oxidation-reduction sensor polypeptide can be reduced by inactivating one 
or more than one of the subunits. For example, CBB3 activity can be reduced by 
inactivating a single subunit of CBB3 such as the ccoN subunit 

1 5 The term "reduced" as used herein with respect to a cell and a particular activity 

(e.g., particular enzymatic activity) refers to a lower level of activity than that measured 
in a comparable cell of the same species. Thus, a R. sphaeroides cell lacking 
geranylgeranyl pyrophosphate synthase activity is considered to have reduced 
geranylgeranyl pyrophosphate synthase activity since most, if not all, comparable R. 

20 sphaeroides cells have at least some geranylgeranyl pyrophosphate synthase activity. 

Such reduced enzymatic activities can be the result of lower enzyme concentration, lower 
specific activity of an enzyme, or combinations thereof. 

Many different methods can be used to make a cell having reduced enzymatic 
and/or biological activity. For example, a R. sphaeroides cell can be engineered to have a 

25 disrupted enzyme-encoding locus using common mutagenesis or knock-out technology. 
Alternatively, antisense technology can be used to reduce enzymatic activity. For 
example, a R. sphaeroides cell can be engineered to contain a cDNA that encodes an 
antisense molecule that prevents an enzyme from being made. The term "antisense 
molecule" as used herein encompasses any nucleic acid that contains sequences that 

30 correspond to the coding strand of an endogenous polypeptide. An antisense molecule 
also can have flanking sequences (e.g., regulatory sequences). Thus, antisense molecules 
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can be ribozymes or antisense oligonucleotides. A ribozyme can have any general 
structure including, without limitation, hairpin, hammerhead, or axhead structures, 
provided the molecule cleaves RNA. 

Cells having a reduced enzymatic and/or biological activity can be identified 
5 using any method. For example, a if. sphaeroides cell having reduced geranylgeranyl 
pyrophosphate synthase activity can be easily identified using common biochemical 
methods that measure geranylgeranyl pyrophosphate synthase activity. See, e.g., Math et 
aU Proc. Natl Acad. ScL USA, 89(15):6761-6764 (1992). 

The invention provides a cell containing reduced geranylgeranyl diphosphate 

10 synthase, aerobic repressor, and/or cbb3-type cytochrome oxidase activity. Such cells can 
have reduced geranylgeranyl diphosphate synthase, aerobic repressor, and/or cbb3-type 
cytochrome oxidase activity as a result of disrupting the endogenous sequences that 
encode polypeptides having these activities. For example, a cell can have reduced 
geranylgeranyl diphosphate synthase activity as a result of knocking out a portion of the 

15 endogenous crtE sequence within a cell's genome; a cell can have reduced aerobic 

repressor activity as a result of knocking out a portion of the endogenous ppsR sequence 
within a cell's genome; and a cell can have reduced cbb3-type cytochrome oxidase 
activity as a result of knocking out a portion of the endogenous ccoN sequence within a 
cell's genome. 

20 The invention also provides a cell containing non-functional crtE, ppsR, and/or 

ccoN nucleic acid sequences within its genome such that the encoded polypeptide is 
either mutated or not expressed. Such cells can be used to produce large amounts of 
CoQ(lO). The sequence of crtE can be as set forth in Genbank® accession number 
AJO 10302. The sequence of ppsR can be as set forth in Genbank® accession number 

25 AJ010302 or L19596. The sequence of ccoN can be as set forth in Genbank® accession 
number U58092. Knockout technology can be used to make cells containing non- 
functional crtE, ppsR, and/or ccoN nucleic acid sequences. 

4. Producing isoprenoid compounds 
30 The cells described herein can be used to produce isoprenoid compounds. For 

example, a microorganism having endogenous DDS activity can be transformed with 
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nucleic acid that encodes a polypeptide having DDS activity such that the microorganism 
produces more CoQ(lO) than had the microorganism not been given that nucleic acid. 
Once transformed, the microorganism can be used cultured under conditions optimal for 
CoQ(lO) production. 

5 In addition, substantially pure polypeptides having enzymatic activity can be used 

alone or in combination with cells to produce isoprenoid compounds. For example, a 
preparation containing a substantially pure polypeptide having DDS activity can be used 
to catalyze the formation of CoQ(lO). Further, cell-free extracts containing a polypeptide 
having enzymatic activity can be used alone or in combination with substantially pure 

10 polypeptides and/or cells to produce isoprenoid compounds. For example, a cell-free 
extract containing a polypeptide having DXS activity can be used to form 1- 
deoxyxyulose-5-phosphate, while a microorganism containing polypeptides have the 
enzymatic activities necessary to catalyze the reactions needed to form CoQ(lO) from 1- 
deoxyxyulose-5 -phosphate can be used to produce CoQ(lO). Any method can be used to 

15 produce a cell-free extract. For example, osmotic shock, sonication, and/or a repeated 
freeze-thaw cycle followed by filtration and/or centrifugation can be used to produce a 
cell-free extract from intact cells. 

It is noted that a cell, substantially pure polypeptide, and/or cell-free extract can 
be used to produce a particular isoprenoid compound that is, in turn, treated chemically to 

20 produce another compound. For example, a microorganism can be used to produce 

CoQ(lO), while a chemical process is used to modify CoQ(lO) into a CoQ(lO) derivative 
such as CoQIO containing a polar group. Likewise, a chemical process can be used to 
produce a particular compound that is, in turn, converted into an isoprenoid compound 
using a cell, substantially pure polypeptide, and/or cell-free extract described herein. For 

25 example, a chemical process can be used to produce deoxyxylose-5-phosphate, while a 
microorganism can be used convert deoxyxylose-5-phosphate into CoQ(lO). 

Typically, a particular isoprenoid compound is produced by providing a 
microorganism and culturing the provided microorganism with culture medium such that 
that isoprenoid compound is produced. In general, the culture media and/or culture 

30 conditions can be such that the microorganisms grow to an adequate density and produce 
the desired compound efficiently. For large-scale production processes, the following 
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methods can be used. First, a large tank (e.g., a 100 gallon, 200 gallon, 500 gallon, or 
more tank) containing appropriate culture medium with, for example, a glucose carbon 
source is inoculated with a particular microorganism. After inoculation, the 
microorganisms are incubated to allow biomass to be produced. Once a desired biomass 
5 is reached, the broth containing the microorganisms can be transferred to a second tank. 
This second tank can be any size. For example, the second tank can be larger, smaller, or 
the same size as the first tank. Typically, the second tank is larger than the first such that 
additional culture medium can be added to the broth from the first tank. In addition, the 
culture medium within this second tank can be the same as, or different from, that used in 

10 the first tank. For example, the first tank can contain medium with xylose, while the 
second tank contains medium with glucose. 

Once transferred, the microorganisms can be incubated to allow for the production 
of the desired isopreniod compound. Once produced, any method can be used to isolate 
the desired compound. For example, if the microorganism releases the desired isoprenoid 

1 5 compound into the broth, then common separation techniques can be used to remove the 
biomass from the broth, and common isolation procedures (e.g., extraction, distillation, 
and ion-exchange procedures) can be used to obtain the isoprenoid compound from the 
microorganism-free broth. In addition, the desired isoprenoid compound can be isolated 
while it is being produced, or it can be isolated from the broth after the product 

20 production phase has been terminated. If the microorganism retains the desired 
isoprenoid compound, then the biomass can be collected and treated to release the 
isoprenoid compound, and the released isoprenoid compound can be isolated. 

The invention will be further described in the following examples, which do not 
limit the scope of the invention described in the claims. 

25 

EXAMPLES 

Example 1 - Cloning nucleic acid that encodes a 
Syhimomonas trueperi polypeptide having DXS activity 
S. trueperi cells were obtained from the American Type Culture Collection 
30 (ATCC Cat. No. 12417). To isolate bacterial genomic DNA, cells were grown in 100- 
200 mL cultures for 2-3 days at 30°C on a shaker rotating at 250 rpm. Cultured cells 
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were centrifuged to form a cell pellet, washed by resuspending the pellet in a solution of 
10 mM Tris/1 mM EDTA, and centrifuged again as before. The cell pellets were 
resuspended in 5 mL of GTE buffer per 100 mL of original culture. GTE buffer is 50 
mM glucose/25 mM Tris-HCl (pH 8.0)/10 mM EDTA (pH 8.0). The bacterial cell walls 

5 were lysed by adding lysozyme (final concentration of 1 mg/mL), Proteinase K (final 
concentration of 1 mg/mL), and mutanolysin (final concentration of 5.5 jig/mL) to the 
resuspended cell solution to form a lysing mixture that was incubated for 90 minutes at 
37°C. After this incubation, sodium dodecyl sulfate was added to the mixture to a final 
concentration of 1 percent, and additional Proteinase K was added until the concentration 

10 in the solution was 2 mg/mL. After a 1 hour incubation at 50°C, the solution containing 
the lysed cells was diluted 1 : 1 with fresh GTE buffer. Once diluted, sodium chloride was 
added to the solution to a final concentration of 0. 1 5 M, Polypeptides and molecules 
other than nucleic acids were removed from the lysed bacterial cell solution by adding an 
equal volume of an organic mixture made up of phenol, chloroform, and isoamyl alcohol 

15 at a ratio of 25 :24: 1 (hereinafter referred to as PCIA). After adding PCIA, the solution 
was mixed. To separate the organic phase from the DNA-containing aqueous phase, the 
mixture was centrifuged at 12,000 x g for 10 minutes. The aqueous phase was transferred 
to a clean tube and re-extracted with an equal volume of chloroform alone. The aqueous 
and organic phases were separated by centrifugation at 3,000 x g for 10 minutes. The 

20 aqueous phase was again removed to a new tube and treated with 2.5 mg of RNase to 
degrade any bacterial RNA present. The purified DNA was recovered by adding 2.5 
volumes of ethanol to the aqueous phase. After mixing the solution, the precipitated 
DNA was removed by spooling it on a glass rod. The spooled DNA was rinsed with 70 
percent ethanol. Once rinsed, the ethanol was allowed to evaporate by leaving the DNA 

25 exposed to the air until dry. The dried DNA was resuspended in a solution of 1 0 mM Tris 
(pH 8.5). The resuspended DNA was re-extracted with PCIA followed by chloroform 
alone as before. The DNA was re-precipitated by adding one-tenth volume of 7.5 M 
ammonium acetate and 2.5 volumes ethanol, followed by spooling, rinsing, and air 
drying. The purified DNA was resuspended in 10 mM Tris (pH 8.5). 

30 The following polymerase chain reaction (PCR) procedure was used to isolate 

nucleic acid that encodes a S. trueperi polypeptide having DXS activity. Three 
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degenerate forward PCR primers (Fl, F2, and F3) and three degenerate reverse PCR 
primers (Rl, R2, and R3) were designed by comparing sequences of several clones that 
encode polypeptides have DXS activity (Figure 15). The sequence of each degenerate 
primer was as follows: 
5 Fl : 5 ' -RTKATTYTMAAYGA YAAYG AAATG-3 ' (SEQ ID NO:53) 

F2: 5 ' -TTTG AAGARYTVGGYWTTAACTA-3 5 (SEQ ID NO:54) 
F3: 5 ' -RC A YC ARGCTTAYSC VCAYAA-3 ' (SEQIDNO:55) 
Rl : 5'-CGTGYTGYTCDGCRATHGCBAC-3 ' (SEQ ID NO:56) 
R2: 5'-TGYTCDGCRATHGCBACRTCRAA-3 ' (SEQIDNO:57) 
10 R3: 5 '-GGSCCD ATRTAGTTAAWRCC-3 ' (SEQ ID NO:58) 

The primers were used in all logical combinations in PCR using Taq polymerase 
(Roche Molecular Biochemicals, Indianapolis, IN) and 1 ng of purified genomic DNA per 
microliter of reaction mix. Each PCR reaction was conducted using a touchdown PCR 

1 5 program with four cycles at each of the following annealing temperatures: 60°C, 58°C, 
56°C, and 54°C, followed by 25 cycles at 52°C. Each cycle had an initial 30 second 
denaturing step at 94°C and a 90 second extension step at 72°C. The program had an 
initial denaturing step of 2 minutes at 94°C and final extension step of 5 minutes at 72°C. 
Between about 2 pM and 12 \\M of each PCR primer was used in each reaction, 

20 depending on the degree of degeneracy. After each PCR reaction was complete, a portion 
of each reaction was separated by gel electrophoresis using a 1 .5 percent TAE (Tris- 
acetate-EDTA) agarose gel. The results from the gel electrophoresis indicated that the 
combination of degenerate primer F3 with degenerate primer R2 produced a nucleic acid 
molecule of 882 bp (referred to as the F3R2 fragment). The F3R2 fragment was purified 

25 away from the agarose gel matrix using the Qiagen Gel Extraction procedure according to 
the manufacturer's instructions (Qiagen Inc., Valencia, CA). A portion of the purified 
fragment was ligated into the pCRII-TOPO vector. The vector containing the F3R2 
fragment was inserted into K colt TOP10 cells using the TOPO cloning procedure 
(Invitrogen, Carlsbad, CA). The transformed TOP 10 cells were plated onto LB agar 

30 plates containing 100 |xg/mL of ampicillin (Amp) and 50 \ig/wL of 5-Bromo-4-Chloro-3- 
Indolyl-p-D-Galactopyranoside (Xgal). Single white colonies were re-plated onto fresh 
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LB-Amp-Xgal plates and screened by PCR with the F3 and R2 primers to confirm the 
presence of plasmids with the desired insert. Plasmid DNAs were obtained from bacterial 
colonies using the QiaPrep Spin Miniprep Kit (Qiagen, Inc). The plasmid DNAs were 
then quantified and sequenced with the Ml 3 forward and reverse primers. Sequence 

5 analysis indicated that the sequence of the F3R2 fragment aligned with sequences from 
other nucleic acid molecules that encode polypeptides having DXS activity. 

To obtain the complete coding sequence for the S. trueperi polypeptide having 
DXS activity, genome walking was performed as follows. Primers were designed based 
upon the sequence of the 882 bp F3R2 fragment for walking in both the upstream and 

10 downstream directions. These walking primers had the following sequences: 

GSP1F: 5 ' -TCGTGACCAAGAAGGGCAAGGGCTATG-3 ' (SEQ ID NO:59) 
GSP2F: 5'-GACAAGTATCACGGCGTCCAGAAGTTC-3' (SEQIDNO:60) 
GSP1R: 5 5 -ATAGCCCTTGCCCTTCTTGGTCACGAC-3 ' (SEQIDNO:61) 
15 GSP2R: 5'-CGAACGGATCATACTCGCTCTCGCTG-3' (SEQ ID NO:62) 

The GSP1F and GSP2F primers are primers that face downstream of the DXS 
polypeptide start codon, while the GSP1R and GSP2R primers are primers that face in the 
opposite direction. In addition, GSP2F and GSP2R are nested inside of the GSP1F and 

20 GSP 1R primers. Genome walking was conducted according to the manual of 

CLONTECH's Universal Genome Walking kit (CLONTECH Laboratories, Inc., Palo 
Alto, C A) with the exception that Fsp I and Sma I were used instead of Dra I and EcoR 
V. The genomic DNA used was from S. trueperi. DMSO was added to the PCR mixture 
until a final concentration of 5 percent was reached. The PCR reactions were performed 

25 using a Perkin Elmer 9700 Thermocycler. The first round of PCR consisted of 7 cycles 
of 2 seconds at 94°C and 3 minutes at 72°C, followed by 36 cycles of 2 seconds at 94°C 
and 3 minutes at 67°C, with a final extension at 67°C for 4 minutes. The second round of 
PCR consisted of 5 cycles of 2 seconds at 94°C and 3 minutes at 72°C, followed by 24 
cycles of 2 seconds at 94°C and 3 minutes at 67°C, with a final extension at 67°C for 4 

30 minutes. After the PCR was complete, a portion of the reaction mix from each round was 
separated by gel electrophoresis using a 1.5 percent TAE agarose gel. Good 
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amplification products were obtained with the Pvu II and Stu I libraries using the GSP1F 
and GSP2F primers and with the Fsp I and Pvu II libraries using the GSP1R and GSP2R 
primers. The second round products from each of these libraries were gel purified, cloned 
using the TOPO cloning procedure (Invitrogen, Carlsbad, CA), and sequenced. A 1.7 
5 kilobase (kb) fragment was subcloned from the Pvu IIF library, a 2.8 kb fragment was 
subcloned from the Stu IF library, a 400 bp fragment was subcloned from the Fsp IR 
library, and a 330 bp fragment was subcloned from the Pvu IIR library. Each of these 
subcloned fragments was sequenced. Sequence analysis indicated that each subcloned 
fragment contained a sequence that overlapped with that of the F3R2 fragment and was 
10 similar to other nucleic acid sequences that encode polypeptides having DXS activity. 

Because the sequence information obtained by genome walking extended 13 bp 
upstream of the translational start codon, a second genome walk was conducted to gain 
additional sequence information. This second walk used GSPB2R, 5*- 
TGAGGATCTTGTGCGGATAGC-ATTGGTG-3' (SEQ ID NO:63) as the first round 
15 primer and GSPB3R, 5 ' - AGCGGCGTCTTG-GGTAGGTCAGCC AT-3 ' (SEQ ID 

NO:64) as the second round primer. The second walk was conducted using only the Sma 
I and Stu I libraries. CLONTECH's Advantage-GC Genomic Polymerase was used for 
PCR with a 1 .0 mM GC Melt concentration according to the manufacturer's 
specifications. The first round of PCR was conducted using a Perkin Elmer 9700 
20 Thermocycler with an initial denaturing step at 96°C for 5 seconds followed by 7 cycles 
consisting of 2 seconds at 94°C and 3 minutes at 72°C, followed by 36 cycles consisting 
of 2 seconds at 94°C and 3 minutes at 66°C, with a final extension at 66°C for 4 minutes. 
The second round of PCR had 5 cycles consisting of 2 seconds at 94°C and 3 minutes at 
72°C, followed by 26 cycles consisting of 2 seconds at 94°C and 3 minutes at 66°C, with 
25 a final extension at 66°C for 4 minutes.* Portions of the PCR products from each round 
were separated by gel electrophoresis using a 1 .5 percent TAE agarose gel. The gel 
electrophoresis revealed the presence of a 250 bp amplification product obtained from the 
second round of PCR using the Stu I library. This fragment was gel purified, cloned 
using the TOPO cloning procedure (Invitrogen, Carlsbad, CA), and sequenced. An 
30 overlap with the previously obtained sequence was found, extending the length of the 
clone to 1 8 1 bp before the start codon. The full-length clone containing coding and non- 
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10 



coding sequence was 3626 bp in length (Figure 2). The open reading frame was 1926 bp 
in length (Figure 3), which encoded a polypeptide with 641 amino acid residues (Figure 
4). 

The coding sequence of the DXS polypeptide was amplified by PCR using S. 
trueperi genomic DNA as template. Primers were designed based on the sequence 
obtained above. The sequences of the primers were as follows: 

SHDXF1: 5 '-ATATGGTACCGTGTGACTGACCTGTCCAAC-3 ' (SEQIDNO:65) 
SHDXR1: 5'-AGTCTCTAGAATGTTGGAGATTCAAGGTGG-3' (SEQIDNO:66) 



These primers were designed to introduce a Kpn I restriction site at the beginning of the 
amplified fragment and an Xba I restriction site at the end of the amplified fragment. The 
sequence of each restriction site is underlined. The PCR reaction mix contained the 
following: 100 ng genomic DNA, 2 pL of each primer (SHDXF1 and SHDXR1, each at 

15 50 nM), 10 \xL 10X Pfu Plus buffer, 5 |iL DMSO, 8 yL dNTPs (10 \jM each) and 5 units 
Pfu polymerase in a final volume of 100 \iL. Each PCR reaction was performed in a 
Perkin Elmer Geneamp PCR system 2400 under the following conditions: an initial 
denaturation at 94°C for 5 minutes; 8 cycles of (1) 94°C for 45 seconds, (2) 55°C for 45 
seconds, and (3) 72°C for 3 minutes; 21 cycles of (1) 94°C for 45 seconds, (2) 61 °C for 

20 45 seconds and (3) 72°C for 3 minutes; and a final extension of 72°C for 10 minutes. A 
portion of the PCR reaction was separated by gel electrophoresis using a 0.8 percent 
TAEgel. The gel electrophoresis revealed a 1.6 kb fragment This fragment was (1) 
purified using a Qiagen Gel Extraction kit (Qiagen Inc., Valencia, CA), (2) treated with 
Kpn I mdXba I (New England BioLabs, Inc., Beverly, MA), and (3) subcloned into 

25 pUC 1 8 that had also been treated with Kpn I and Xba I and gel purified. The resulting 

construct designated appUC18-SHDXS is depicted in Figure 18, The ligation was carried 
out with T4 DNA ligase at 16°C for 16 hours. Once ligated, 1 ^iL was used to 
electroporate E. coli ElectroMAX™ DH10B™ cells (Life Technologies, Inc., Rockville, 
MD). The electroporated cells were plated on LB-Amp plates (Amp concentration =100 

30 |ag/mL). From these plates, eight individual colonies were chosen at random. The 

plasmid was isolated from each colony using a QiaPrep Spin Miniprep kit (Qiagen Inc., 
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Valencia, CA). The extracted plasmid DNA was examined for the presence of the 1.6 kb 
fragment by digesting individual aliquots with one of three different restriction enzymes: 
EcoR I, BamH I, and Nar L If the plasmids contained the correct 1 .6 kb fragment, the 
EcoR I digest reaction would result in two fragments (0.77 and 4.13 kb), the BamHl 
5 digest reaction would result in one fragment (4.8 kb), and the Nar I digest reaction would 
result in two fragments (1.9 and 2.9 kb). After treating with the restriction enzymes, the 
digest reactions were separated by gel electrophoresis using a 0.8 percent TAE agarose 
gel. All 8 clones yielded digestion fragments consistent with a clone of 1 .6 kb. 

10 Example 2 - Introducing nucleic acid that 

encodes a polypeptide having DXS activity into cells 
The nucleic acid molecule that encodes a polypeptide having DXS activity and 
was obtained as described in Example 1 is introduced into cells as follows. First, a 
construct is made to contain the nucleic acid molecule such that the encoded polypeptide 
15 having DXS activity is expressed in a desired host cell. When using prokaryotic cells, a 
construct functional in prokaryotic cells is used. When using eukaryotic cells, a construct 
functional in eukaryotic cells is used. Second, the construct is introduced into the desired 
host cell using appropriate methods. Once introduced, stable transformants are selected. 

20 Example 3 - Cloning nucleic acid that encodes 

a Rhodobacter sphaeroides polypeptide having DPS activity 
R. sphaeroides ATCC strain 17023 cells were grown in 550 R 8 A H media at 
30°C and 100 rpm. The recipe for 550 R 8 A H media was provided by ATCC. Genomic 
DNA was isolated from R. sphaeroides cells as described in Example 1 . 

25 To isolate nucleic acid encoding an R sphaeroides polypeptide having DDS 

activity, degenerate primers were designed and used as described in Example 1 . Briefly, 
three degenerate forward primers (F4, F5, and F6) and four degenerate reverse primers 
(R4, R5, R6, and R7) were designed by comparing sequences of several clones that 
encode polypeptides have DDS, SDS, or ODS activity (Figure 16). The sequence of each 

30 degenerate primer was as follows: 
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F4: 5 ' -GGWGGHAARMGMMTKCG YCC-3 9 (SEQ ID NO:67) 
F5: 5 5 - AC WYTGSTDC ATGATGATGT-3 ' (SEQIDNO:68) 
F6: 5 '-ACNYTNBTNCAYGAYGAYGT-3 * (SEQIDNO:69) 
R4: 5 '-TYRTCYACSACATCATCATG-3' (SEQIDNO:70) 
5 R5: 5 '-TGHAVKAC YTCACC YTCRGMAAT-3 ' (SEQ ID N0:71) 

R6: 5 ' -TARTCNARDATRTCRTCD AT-3 * (SEQIDNO:72) 
R7: 5 ' -TCRTCNCCNAYNKTYTTNCC-3 ' (SEQIDNO:73) 



These primers were used in all logical combinations in PCR using Taq polymerase 

10 (Roche Molecular Biochemicals, Indianapolis, IN) and 1 ng of genomic DNA per 

microliter of reaction mix. PCR was conducted using the touchdown PCR program as 
described in Example 1 . Between about 4 pM and 8 pM of each PCR primer was used in 
each reaction, depending on the degree of degeneracy. After each PCR reaction was 
complete, a portion of each reaction was separated by gel electrophoresis using a 1.5 

15 percent TAE agarose gel. The results from the gel electrophoresis yielded no fragments 
of the expected size. A second amplification reaction was then performed using each 
sample from the first round of PCR. Briefly, one pL of reaction mixture from each first 
round of PCR was used in a 50 pL amplification reaction using the same primer pairs and 
thermocycling parameters used in the first round of PCR. A portion of each of the second 

20 round PCR reactions was separated by gel elecrophoresis using a 1 .5 percent TAE 

agarose gel. The combination of degenerate primers F6 and R5 produced a fragment of 
209 bp (referred to as the F6R5 fragment). The F6R5 fragment was isolated from an 
agarose gel and purified using the Qiagen Gel Extraction procedure (Qiagen Inc., 
Valencia, CA). An aliquot of the purified fragment was ligated to pCRII-TOPO, and the 

25 product of the ligation reaction was inserted into TOP10 E. coli cells using a TOPO 

cloning procedure (Invitrogen, Carlsbad, CA). The products of the individual insertion 
reactions were plated onto LB media containing 100 pg/mL Amp and 50 pg/mL Xgal. 
Single white colonies that grew on the LB-Amp-Xgal plates were re-plated onto fresh 
LB-Amp plates and screened in a PCR reaction using the F6 and R5 primers to confirm 

30 the presence of the desired insert. Plasmid DNAs were obtained from several colonies 
using a QiaPrep Spin Miniprep kit (Qiagen, Inc). The obtained plasmid DNAs were 
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quantified and sequenced with the M13 forward and reverse primers. Sequence analysis 
revealed that the F6R5 fragment contained sequences that aligned with sequences from 
other nucleic acid molecules that encode polypeptides having polyprenyl diphosphate 
synthase activity. 

5 Genome walking was performed to obtain a complete coding sequence for the R. 

sphaeroides DDS polypeptide using procedures similar to those described in Example 1. 
Briefly, primers were designed based on the sequence of the F6R5 fragment for walking 
in both the upstream and downstream directions. These primers had the following 
sequences: 

10 

GSP3F: 5 '-TGGAAGCTGCGGGCGAAGAGATAGTC-3 * (SEQIDNO:74) 
GSP4F: 5 '-CCCACCAGCACCGAGGATTTGTTGTC-3 ' (SEQEDNO:75) 
GSP3R: 5 '-GAACCTGCTGTGGGACAACAAATCCTC-3 ' (SEQIDNO:76) 
GSP4R: 5 5 -TCGGTGCTGGTGGGCG ACTATCTCTTC-3 ' (SEQIDNO:77) 

15 

The GSP3F and GSP4F primers are primers that face downstream of the DDS 
polypeptide start codon, while the GSP3R and GSP4R primers are primers that face in the 
opposite direction. In addition, the GSP4F and GSP4R primers are nested inside the 
GSP3F and GSP3R primers. 

20 The Pvu II, Fsp I, and Stu I libraries with the GSP3F and GSP4F primers and all 

four libraries with the GSP3R and GSP4R primers resulted in the production of amplified 
fragments. A 750 bp fragment from the Pvu I library, a 500 bp fragment from the Fsp I 
library, a 1 .4 kb fragment from the Stu I library, and a 0.9 kb fragment from the Sma I 
library were all subcloned and sequenced. Sequence analysis indicated that each 

25 subcloned fragment contained a sequence that overlapped with the sequence of the F6R5 
fragment and was similar to other nucleic acid sequences that encode polypeptides having 
polyprenyl diphosphate synthase activity. The full-length clone containing coding and 
non-coding sequence was 1990 bp in length (Figure 7). The open reading frame was 
1002 bp in length (Figure 8), which encoded a polypeptide with 333 amino acid residues 

30 (Figure 9). 
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The coding sequence of the DDS polypeptide from R. sphaeroides was amplified 
by PCR using R. sphaeroides genomic DNA as template. PCR primers were designed 
based on the sequences obtained as described above. The sequences of the primers were 
as follows. 

5 

RDS18F: 5 ' - ACTAGAATTCCGCAACAGTTCCTTCATGTC-3 ' (SEQIDNO:78) 
RDS18R: 5 ' - ATAG AAGCTTACTTGCGGTCGGACTGATAG-3 ' (SEQIDNO:79) 

These primers were designed to introduce an EcoR I restriction site at the beginning of 

1 0 the amplified fragment and a Hind III restriction site at the end of the amplified fragment. 
The sequence of each restriction site is underlined. The PCR reaction mix contained the 
following: 100 ng genomic DNA, 2 ^iL of each primer (RDS18F and RDS18R, each at 50 
\M), 10 ^iL 10X P/u Plus buffer, 5 DMSO, 8 \iL dNTPs (10 mM each) and 5 units PJu 
polymerase in a final volume of 100 jxL. Each PCR reaction was performed in a Perkin 

15 Elmer Geneamp PCR system 2400 under the following conditions: an initial denaturation 
at 94°C for 5 minutes; 8 cycles of (1) 94°C for 45 seconds, (2) 55°C for 45 seconds, and 
(3) 72°C for 3 minutes; 21 Cycles of (1) 94°C for 45 seconds, (2) 61°C for 45 seconds, 
and (3) 72°C for 3 minutes; and a final extension of 72°C for 10 minutes. After 
completing the PCR reactions, each PCR reaction was separated by gel electrophoresis 

20 using a 0.8 percent TAE agarose gel. The gel electrophoresis revealed a 1 .6 kb fragment. 
This fragment was (1) purified from the agarose gel using a Qiagen Gel Extraction kit, (2) 
digested with EcoR I and Hind III (New England BioLabs, Beverly, MA), and (3) ligated 
to pUC18 that had also been digested with EcoR I and Hind HI and gel purified. The 
resulting construct designated appUCl 8-RSdds is depicted in Figure 1 9. The ligation was 

25 carried out with T4 DNA ligase at 16°C for 16 hours. Once ligated, one \xL of the 

ligation reaction was used to electroporate E. coli ElectroMAX™ DH10B™ cells (Life 
Technologies, Inc., Rockville, MD). The electroporated cells were plated onto LB-Amp 
plates (Amp concentration was 100 p,g/mL). From these LB-Amp plates, eight individual 
colonies were selected at random, and the plasmids within these colonies were purified 

30 using a Qiaprep Spin Miniprep kit. These purified plasmids were evaluated for the 

presence of inserts by restriction enzyme analysis. If the plasmids contained the correct 
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1 .6 kb fragment, then an EcoR I and Hind III digest reaction would result in two 
fragments (2.6 and 1.6 kb), and a BamHl digest reaction would result in one fragment 
(4.2 kb). After treating with the restriction enzymes, the digest reactions were separated 
by gel electrophoresis using a 0.8 percent TAE agarose gel. Of the eight clones tested, 
5 four contained the desired 1 .6 kb fragment. 



Example 4 - Cloning nucleic acid that encodes 
a Sphinzomonas ti A ueveri polypeptide having DPS activity 
S. trueperi cells were grown as described in Example 1 . In addition, genomic 
10 DNA was isolated from S. trueperi cells as described in Example 1 . 

To isolate nucleic acid encoding a polypeptide having DDS activity from S. 
trueperi, a strategy similar to that described in Example 3 was employed. In this case, 
four degenerate forward primers (SF1, SF2, SF3, and SF4) and four degenerate reverse 
primers (SRI, SR2, SR3, and SR4) were designed comparing sequences of several clones 
15 that encode polypeptides having polyprenyl diphosphate synthase activity (Figure 17). 
Codon usage tables from twelve Sphingomonas species were used to develop an average 
preferred codon table that was used in primer design. The sequence of each degenerate 
primer was as follows: 



20 SF1 : 5 '-CTSSTSCAYGAYGAYGTSGTSGA-3 ' (SEQ ID NO:80) 

SF2: 5'-GTSGMVGSSGGSGGSAARC-3' (SEQ ID NO:81) 
SF3: 5 '-CTSMTSCAYGAYGAYGTS-3 * (SEQ IDNO:82) 
SF4: 5 5 -DSSRTBCTSGTSGGSGAYTT-3 ' (SEQIDNO:83) 
SRI: 5'-VAKRAARTCSCCSACSAGSAC-3' (SEQIDNO:84) 

25 SR2: 5'-SACYTCSCCYTCSGCRAT-3' (SEQ ID NO:85) 

SR3: 5'-RTCRTCSCCVAYVKTYTTSCC-3' (SEQIDNO:86) 
SR4: 5'-SGGSAGSGTVRBYTTSCCYTC-3' (SEQ ID NO:87) 



The primers were used in all logical combinations in PCR using Taq polymerase 
30 (Roche Molecular Biochemicals, Indianapolis, IN) and 1 ng of genomic DNA per 

microliter of reaction mix. PCR was conducted using the touchdown PCR program as 
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described in Example 1 . Between about 4^M and 20 pM of each PCR primer was used in 
each reaction depending on the degree of degeneracy. After each PCR reaction was 
complete, a portion of each reaction was separated by gel electrophoresis using a 1 .5 
percent TAE agarose gel. Each PCR reaction produced several amplified fragments of 
5 the expected sizes based on the coding sequences of other polyprenyl diphosphate 
synthase polypeptides. These fragments were isolated from TAE agarose gels and 
purified using the Qiagen Gel Extraction procedure (Qiagen Inc., Valencia, CA). An 
aliquot of each purified fragment was ligated into pCRII-TOPO. The ligated plasmids 
were then inserted into TOP10 K coli cells using a TOPO cloning procedure (Invitrogen, 
1 0 Carlsbad, CA). The products of each of the individual insertion reactions were plated on 
LB-Amp-Xgal plates as described in Examples 1 and 3. Single white colonies that grew 
on the LB-Amp-Xgal plates were re-plated onto fresh LB-Amp-Xgal plates and screened 
in a PCR reaction using the initial degenerate primers to confirm the presence of the 
desired insert. Plasmid DNAs having the desired insert were obtained from multiple 
1 5 colonies using a QiaPrep Spin Miniprep Kit (Qiagen, Inc). The obtained plasmid DNAs 
were then quantified and sequenced using the Ml 3 forward and reverse primers. 
Sequence analysis revealed that a 201 bp fragment produced using the SF1 and SR2 
degenerate primers, a 476 bp fragment produced using the SF1 and SR4 primers, and a 
206 bp fragment produced using the SF3 and SR2 primers contained sequences similar to 
20 the coding sequences of other polyprenyl diphosphate synthases. 

Genome walking was performed to obtain a complete coding sequence for the S. 
trueperi DDS polypeptide using procedures similar to those described in Example 1. 
Briefly, primers were designed based on the sequences of the obtained fragments. These 
primers had the following sequences: 

25 

GSP5F; 5 5 -GTGCTGGTCGGCG ACTTCCTGTTC AG-3 ' (SEQ ID NO:88) 
GSP6F: S'-ATCGACCTOTCCGAGGATCGCTATCTC-S' (SEQ IDNO:89) 
GSP5R: 5 '-TCGAACGAGCGGCTGAAC AGGAAGTC-3 ' (SEQ IDNO:90) 
GSP6R: 5 '-TGGCGGGATTGCCCCAGATGATGTTG-3 ' (SEQIDNO:91) 

30 
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The GSP5F and GSP6F primers are primers that face downstream of the DDS 
start codon, while the GSP5R and GSP6R primers are primers that face in the opposite 
direction. In addition, the GSP6F and GSP6R primers are nested inside the GSP5F and 
GSP5R primers. 

5 Genome walking was conducted as described in Example 3 with the exception 

that the 36 cycles had 3 minute incubations at 66°C instead of 67°C and the final 
extension was performed at 66°C instead of 67°C for both the first and second rounds of 
PCR. Portions of the PCR reactions from each round were separated by gel 
electrophoresis using a 1.5 percent TAE agarose gel. PCR on the Fsp I and Stu I libraries 

1 0 with the forward primers and of all four libraries with the reverse primers resulted in the 
production of an amplified fragment. A 1 .4 kb fragment from the Fsp I library, a 1.1 kb 
fragment from the Stu I library (forward primer), a 2.0 kb fragment from the Pvu II 
library (forward primer), and a 3.0 kb fragment from the Stu I library (reverse primer) 
were gel purified, cloned using the TOPO cloning procedure, and sequenced as described 

1 5 in Examples 1 and 3. The sequencing analysis revealed that these fragments contained 
sequences that overlapped with the sequence of the initially obtained fragments and were 
similar to the coding sequences of other polyprenyl diphosphate synthases. The full- 
length clone containing coding and non-coding sequence was 1 833 bp in length (Figure 
10). The open reading frame was 1014 bp in length (Figure 11), which encoded a 

20 polypeptide with 337 amino acid residues (Figure 12). 

The coding sequence of the DDS polypeptide from S. trueperi was amplified by 
PCR using S. trueperi genomic DNA as template. PCR primers were designed based on 
the sequences obtained as described above. The sequences of the primers were as 
follows. 

25 

SHDDSF: 5 ' - ATTAGGTACC ATCAGATAATCGTCGCTC AA-3 ' (SEQIDNO:92) 
SHDDSR: 5 5 -TATAGGATCCGAC ATGGACGAGGAAGACGC-3 * (SEQIDNO:93) 

These primers were designed to introduce a Kpn I restriction site at the beginning 
30 of the amplified fragment and a BamH I restriction site at the end of amplified fragment. 
The sequence of each restriction site is underlined. The PCR reactions were performed as 



66 



WO 02/26933 



PCT/US01/30328 



described in Example 3 with the exception that primers SHDDSF and SHDDSR were 
used instead of RDS18F and RDS18R. Once the PCR was completed, the PCR reactions 
were separated by gel electrophoresis using a 0.8 percent TAE agarose gel. The gel 
electrophoresis revealed a 1.6 kb fragment. This 1.6 kb fragment was (1) purified using a 

5 Qiagen Gel Extraction kit, (2) digested with Kpn I and BamH I (New England BioLabs), 
and (3) ligated into pUC18 that had also been digested with Kpn I and BamH I and gel 
purified using methods similar to those described in Example 3. The resulting construct 
designated appUC18-SHDDS is depicted in Figure 20. This construct was used to 
transform cells as described in Example 3. The transformed cells were plated onto LB- 

10 Amp plates, and eight individual colonies were selected at random. Plasmid DNA was 
isolated from each colony using a QiaPrep Spin Miniprep kit. The extracted plasmid 
DNA was tested for the presence of the 1 .6 kb fragment using three different restriction 
digests. If the plasmids contained the 1.6 kb fragment, then a BamH I and Kpn I digest 
would yield two fragments (2.68 and 1.62 kb), an EcoR I digest would yield two 

1 5 fragments (1 .45 and 2.85 kb), and a Ban II digest would yield two fragments (0.48 and 
3 .8 kb). All eight plasmids tested yielded digestion fragments consistent with a plasmid 
containing the desired 1 .6 kb fragment. 

Exam ple 5 - Measuring CoO(lO) 

20 Harvested cells were suspended in water to have about 0.1 gm dry weight per mL. 

The suspension was subjected to a French-press, and the resulting in suspension was 
frozen in 1 mL aliquots until used. 

To measure CoQ(10) in a sample, two aliquots were repeatedly thawed and 
refrozen 4-5 times. Once transferred to a 50 mL centrifuge tube, 1 mL of 5% sodium 

25 dodecyl sulfate was added to the thawed material. The material was then flushed with 
nitrogen. After vortexing for one minute, six mL of ethanol was added to the material, 
and the resulting mixture was vortexed for one minute. Then, 15 mL of hexane was 
added to the mixture. After vortexing for five minutes, the mixture was centrifuged at 
3000 rpm for ten minutes. Once centrifuged, the hexane layer was removed to a conical 

30 flask and flushed with nitrogen. This hexane extraction was repeated two times. The 

three extracts were pooled into a single tube that was evaporated on a vacuum evaporator 
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until the residue was near dryness. The residue was dissolved in 2 mL of mobile phase by 
vortexing for 2-3 minutes. Once vortexed, the solution was transferred to a 5 mL 
volumetric flask. The tube that contained the residue was rinsed two additional times 
with 1 mL of mobile phase. Each time the rinse solution was transferred to the same 5 

5 mL volumetric flask. After adjusting the total volume to 5 mL, the solution was mixed 
well and stored at -20°C until analyzed. 

As a control, either water or a culture solution was spiked with standard CoQ(lO), 
extracted as indicated above, and analyzed to determine the recovery of the spiked 
material. The CoQ(lO) standard was a stock solution of CoQ(lO), obtained from Sigma. 

1 0 The stock solution was made in HPLC grade ethanol at a concentration of 1 00 ng/mL, 
and then diluted to get CoQ(lO) solutions ranging from 100 jxg/mL to 1 |xg/mL. 

HPLC analysis was performed with the following parameters. The mobile phase 
was ethanokmethanol (7:3) or methanohisopropylether (9:1). The flow rate was 0.75 
mL/min. The column was Waters Nova-PakC18 (3.9 xl50 mm; 4Um). The detector 

1 5 was a PDA set from 200-300 nm with the resolution at 1 .2 nm and the maximum 

absorbance at 275 nm. The run time was 1 5 minutes, and the injection volume was 50 
|iL. To calculate the amount of CoQ(l 0) present, 50 \iL of each sample was injected, and 
the results compared to those obtained using the calibration curve. From these data 
points, the concentration per gm dry weight was calculated. 

20 

Example 6 - Introducing nucleic acid that encodes a polypeptide 
having DPS activity into cells and measuring isoprenoid levels 
The following procedures were followed individually for the R. sphaeroides and 
S. trueperi nucleic acid isolated as described in Examples 3 and 4, respectively. 
25 Plasmid DNA encoding the polypeptide having DDS activity was electroporated 

into wild type E. coli strain MG1655. The electroporated cells were plated onto LB-Amp 
plates. A single individual bacterial colony was picked for each DDS coding sequence, 
and each colony was grown overnight in 2 mL of LB-Amp at 37°C with 200 rpm shaking. 
About 0.75 mL of these overnight cultures were used to inoculate flasks containing 75 
30 mL LB-Amp medium (Amp concentration was 1 00 |ig/mL). These second cultures were 
grown at 37°C at 200 rpm for 30 hours. Additional Amp (to a final concentration of 50 
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\ig of fresh Amp per mL) was added to each flask after 12 hours of growth. After 30 
hours, the bacteria were collected by centrifugation at 8,000 g for 10 minutes. The 
resulting bacterial cell pellets were washed by adding 20 mL of 10 mM Tris-HCL buffer 
(pH 8.0), resuspending the cells, and re-centrifuging as before. Each cell pellet was then 
5 resuspended in 10 mL of water. About 0.5 mL of each extract was used for dry mass 
analysis and the remaining cell suspensions (about 9.5 mL) were frozen at -20°C 
overnight. 

The 9.5 mL cell suspensions were used as follows. First, the cells were thawed on 
ice and lysed by passing the cell suspensions through a French press three times (14,000 
10 psi pressure). The resulting cell extracts were frozen at -20°C in 1 mL aliquots and 
maintained on ice prior to analysis. 

High pressure liquid chromatography was performed using Waters' 2690 Alliance 
integrated system (Waters Corporation, Milford, Mass). Prior to analysis, all samples and 
standards were dissolved in HPLC-grade ethanol, loaded into the built-in auto-sampler, 
15 and kept at 5M0°C in the dark. The separation was carried out using an isocratic elution 
program of 70:30 ethanol/methanol (v/v) at a flow rate of 1.0 mL/min. The column was a 
Waters Nova-Pak CI 8, 3.9-150 mm equipped with a guard column of the same stationary 
phase. The injection volume was typically 10-25 \iL. Total run time was ten minutes. 

Under these conditions, retention times were 3.1 and 4.9 minutes for CoQ(8) and 
20 CoQ(10), respectively. For quantification purposes, a four-point external calibration 

curve was calculated using freshly prepared CoQ(10) standards. Calibration levels were 
1 .0, 4.0, 1 0.0 and 1 00.0 p,g/mL (ppm). Each standard was injected in triplicate, and the 
resulting calibration plot was linearly fitted with observed r^s of >0.999. 

For UV and MS detection, a photodiode array (PDA, Model UV6000LP, 
25 ThermoQuest Corp., San Jose, CA) and an ion trap mass analyzer (LCQ Classic, 
Finnigan/ThermoQuest Corp., San Jose, CA) were connected in series with the 
chromatograph and without splitting of the effluent. The PDA was operated in scanning 
mode from 220-300 nm. Effluent from the PDA was introduced into the mass analyzer 
via atmospheric-pressure chemical ionization (APCI) using the following parameters: 
30 capillary temperature, 1 50°C; capillary voltage, 3kV; vaporizer temperature, 400°C; 
sheath gas (N2) flow, 80 arbitrary units; auxiliary gas (N 2 ) flow, 5 arbitrary units; and 
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corona discharge needle, 5mA/6kV. Positive-ion detection was performed in full scan 
(250-1000 m/z), 2 mscans, 500 ms ion injection time. 

Under these conditions, CoQ(8) yielded a mass spectrum with a base peak at 
727.5 m/z, corresponding to the protonated 'molecular ion' as well as several satellite 
5 ions from ethanol and/or methanol adducts (Figures 23 and 24). Similarly, CoQ(10) 
yielded a mass spectrum with a base peak at 863.6 m/z corresponding to its protonated 
'molecular ion' (Figure 25). Several ethanol and/or methanol satellite adducts were 
observed as well. Both CoQ(8) and CoQ(10) yielded UV spectra with maxima at 274 
nm. 

10 Two samples were analyzed: MG1655 PUC18 and MG1655 PUC18-DDS. 

MG1655 PUC18 is K coli strain MG1655 transfected with the PUC1 8 vector only. 

MG1655 PUC18-DDS is E. coli strain MG1655 transfected with the PUC18 vector 

containing nucleic acid that encodes a R. sphaeroides polypeptide having DDS activity. 

The MG1655 PUC18 specimen contained only CoQ(8) (retention time 3.08 min, Figure 
15 21) as confirmed by its mass spectrum (Figure 23), with a base peak at 727.4 m/z and a 

UV spectrum with a maximum at 274 nm. The MG1655 PUC18-DDS specimen, 

however, contained CoQ(8) and CoQ(10) (Figure 22), both of which were confirmed by 

matching mass spectra (Figures 24 and 25) and UV maxima. 

20 Example 7 - Cloning nucleic acid that encodes 

a Svhingomonas trueperi polypeptide having DXR activity 
Sphingomonas trueperi ATCC 1 24 1 7 cultures (1 00-200 mL) were grown in 
nutrient broth at 30°C and 250 rpm for 2-3 days. The cells then were pelleted and washed 
with a 10 mM Tris:l .0 mM EDTA solution. The pellets were resuspended in 5 mL of 

25 GTE buffer (50 mM glucose, 25 mM Tris HC1 (pH 8.0), 10 mM EDTA (pH 8.0)) per 100 
mL of culture. Lysozyme and Proteinase K were added to a 1 mg/mL concentration and 
mutanolysin was added to 5.5 |ig/mL. After a 1.5 hour incubation at 37°C, SDS was 
added to a final concentration of 1%, and the concentration of Proteinase K was brought 
to 2 mg/mL. After incubation at 50°C for one hour, an equal volume of GTE buffer was 

30 added, and NaCl was added to a 0. 1 5 M concentration. The mixture was extracted with 
an equal volume of phenol:chloroform:isoamyl alcohol (25:24:1) and centrifuged at 
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10,000 rpm for 10 minutes. The supernatant was removed to a clean tube, extracted with 
an equal volume of chloroform, and centrifiiged at 5,000 rpm for 10 minutes. The 
supernatant was treated with RNAse and precipitated with 2.5 volumes of ethanol. The 
spooled DNA was washed with 70% ethanol, air dried, and resuspended in 10 mM Tris 

5 (pH 8.5). After resuspending, the resuspended DNA was further cleaned by re-extraction 
with phenol:chloroform:isoamyl alcohol and chloroform, and reprecipation with 1/10 
volume 7.4 M NKUOAc and 2.5 volumes ethanol. 

A conserved region of the 1-deoxy-D-xylulose 5-phosphate reductoisomerase 
(dxr) gene was cloned by PCR. Five degenerate forward and five degenerate reverse 

1 0 PCR primers were designed from conserved protein regions that were revealed by 

aligning known dxr genes (Figure 27). The degenerate sequences were designed from the 
conserved regions using the universal codon table. The primers were used in all logical 
combinations in PCR using Taq polymerase (Roche Molecular Biochemicals, 
Indianapolis, IN) and 1 ng of genomic DNA/nL reaction mix. PCR was conducted using 

1 5 a touchdown PCR program with 4 cycles at an annealing temperature of 59°C, 4 cycles at 
57°C, 4 cycles at 55°C, and 24 cycles at 53°C. Each cycle used an initial 30 second 
denaturing step at 94°C and a 1 .75 minute extension at 72°C, and the program had an 
initial denaturing step for 2 minutes at 94°C and final extension of 5 minutes at 72°C. 
The amounts of PCR primer used in the reaction were increased 3-12 fold above typical 

20 PCR amounts depending on the amount of degeneracy in the 3 ' end of the primer. In 

addition, separate PCR reactions containing each individual primer were made to identify 
PCR products resulting from single degenerate primers. Fifteen \iL of each PCR product 
was separated on a 1.5% TAE (Tris-acetate-EDTA)-agarose gel. Degenerate primers F2 
(5'-CCSGTSGAYWSSGARCAYAACGCS-3' (SEQ ID NO:132)) and R7 (5'- 

25 ATGATGAACAAGGGSCTSGAR-3 ' (SEQ ID NO: 1 33)) produced a band of about 250 
bp, which was the expected size based on dxr genes from other species. This band was 
not present in the individual F2 and R7 primer control reactions. Degenerate primers F3 
(5'-CATCCVAACTGGWMVATGGG-3 5 (SEQ ID NO:134)) and R2 (5'- 
ATYGGYRW WCKC AT ATCMGG-3 ' (SEQ ID NO:135)) produced a band of about 200 

30 bp, which also was the expected size. The F2-R7 and F3-R2 fragments were isolated and 
purified using a QIAquick Gel Extraction Kit (Qiagen Inc., Valencia, CA). Three \iL of 
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the purified band was ligated into pCR^I-TOPO vector, which was then transformed by a 
heat-shock method into TOP10 E. coli cells using a TOPO cloning procedure (Invitrogen, 
Carlsbad, CA). Transformations were plated on LB media containing 100 jxg/mL of 
ampicillin and 50 jig/mL of 5-Bromo-4-Chloro-3-Indolyl-B-D-Galactopyranoside (X- 
5 gal). Individual, white colonies were resuspended in about 20 |xL of 10 mM Tris and 
heated for 10 minutes at 95°C to break open the bacterial cells. To screen individual 
colonies, 2 pL of the heated cells was used in a 25 jxL PCR reaction as described above 
using the appropriate degenerate primers. Plasmid DNA was obtained with a QIAprep 
Spin Miniprep Kit (Qiagen, Inc) from cultures of colonies having the desired insert and 
1 0 used for DNA sequencing with Ml 3R and Ml 3F primers. Sequence analysis revealed 
that the F2-R7 and F3-R2 fragments overlapped and were homologous to known dxr 
genes. 

Genome walking was performed to obtain the complete coding sequence as 
follows. The overlapping of the F2-R7 and F3-R2 fragments resulted in a sequence 358 
15 bp in length. The following four primers for conducting genome walking in both 

upstream and downstream directions were designed using the portion of this sequence 
that was internal to the degenerate primers: 



GSP1F 5'-CGAATGGACGACGGATTGGCGATGGAC-3 ' (SEQ IDNO:136) 
20 GSP2F 5 ' -TC AGTTCGAGCCCCTTGTTC ATC ATCGTC-3 ' (SEQ ID NO:137) 

GSP1R 5 ' -CGAACTGATCGAAGCCTTCCACCTGTTC-3 ' (SEQIDNO:138) 
GSP2R 5'-GGTCCATCGCCAATCCGTCGTCCATTC-3' (SEQ IDNO:139) 

The GSP1F and GSP2F primers faced upstream, the GSP1R and GSP2R primers faced 
25 downstream, and the GSP2F and GSP2R primers were nested inside the GSP1F and 
GSP1R primers. Genome walking was conducted according to the manual for 
CLONTECH's Universal Genome Walking Kit (CLONTECH Laboratories, Inc., Palo 
Alto, CA) with the exception that the enzymes Fspl and Smal were used in place of Dral 
and EcoRV. The Dral and EcoRV enrymes were replaced because they cut S. trueperi 
30 genomic DNA too infrequently to give fragment lengths amenable to PCR. The PCR 
mixture contained 5% DMSO. First round PCR was conducted in a Perkin Elmer 9700 
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Thermocycler with 7 cycles consisting of 2 seconds at 94°C and 3 minutes at 72°C, and 
36 cycles consisting of 2 seconds at 94°C and 3 minutes at 66°C, with a final extension at 
66°C for 4 minutes. Second round PCR used 5 cycles consisting of 2 seconds at 94°C 
and 3 minutes at 72°C, and 26 cycles consisting of 2 seconds at 94°C and 3 minutes at 
5 66°C, with a final extension at 66°C for 4 minutes. Nine pL of the first round product 
and seven nL of the second round product were separated on a 1.5% TAE-agarose gel A 
1 .3 Kb band was obtained from the second round product for the Smal forward reaction, 
an 800 bp band for the StuI reverse reaction, and a 750 bp band for the PvuII reverse 
reaction. These fragments were gel purified, cloned, and sequenced. Internal primers 

10 were used to amplify and obtain additional sequence of the gene. Sequence analysis 
revealed that the sequence derived from genome walking overlapped with the original 
fragments and contained an entire coding sequence homologous to known dxr genes. The 
full-length clone containing coding and non-coding sequence was 2017 bp in length 
(Figure 28). The open reading frame starting with the first GTG site was 1 161 bp in 

15 length (Figure 29), which encoded a polypeptide with 386 amino acid residues (Figure 
30). 

Example 8 - Making recombinant microorganisms 
Rhodobacter sphaeroides (ATCC 35053) was routinely maintained on Luria 

20 Bretain (Miller) agar (Fisher scientific) plates. When needed, ft sphaeroides was 

cultured as follows. A 5 mL culture was grown in a 15 mL culture tube at 30°C in Innova 
4230 Incubator, Shaker (New Brunswick Scientific, Edison, NJ) with a shaking speed of 
250 rpm. Each 5 mL culture was started by inoculating liquid media (Sistrom media 
supplemented with 20% LB) with a single colony. The liquid media contained the 

25 following ingredients per liter: 2.72 g KH 2 P0 4} 0.5 g (NH^SCU, 0.5 g NaCl, 0.2 g EDTA 
disodium salt, 0.3 g MgS0 4 - 7H 2 0, 0.033 g CaClr 2H 2 0, 0.2 mg FeS0 4 - 7H 2 O,0.02 
mL (NH4)6Mo 7 0 2 4- 4H 2 0 (1% solution), 1 mL Trace element solution, 0.2 mL Vitamin 
solution, 5 g Luria Bretain Broth Mix, and 8 mL Glucose (50%). The Trace element 
solution contained the following ingredients per liter: 1.765 g EDTA disodium salt, 10.95 

30 g ZnS0 4 - 7H 2 0, 5 g FeS0 4 * 7H 2 0, 1 .54 g MnS0 4 - H 2 0, 0.392 g CuS0 4 - 5H 2 0, 0.284 g 
Co(N03)r 6H 2 0, andO.114 gH 3 B0 3 . The Vitamin solution contained the following 
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ingredients per liter: 10 g Nicotinic acid, 5 g Thiamine HO, and 0.01 g Biotin. The 
vitamins and glucose were added after the media cooled to room temperature after 
autoclaving. When necessary, the media was supplemented with one or more of the 
following antibiotics: Kanamycin (25 jig/mL; final concentration), Spectinomycin (25 
5 pg/mL; final concentration), and/or Streptomycin (25 Mg/mL; final concentration). 

Electrocompetent R. sphaeroides cells 

Electrocompetent R. sphaeroides cells were made as follows. A 5 mL culture of 
R. sphaeroides was grown overnight at 30°C in Sistrom's media supplemented with 20% 

10 LB. This culture was diluted 1/100 in 300 mL of the same media and grown to an OD660 
of 0.5-0.8. The cells were chilled on ice for 10 minutes and then centrifuged for 6 
minutes at 7,500 g. The supernatant was discarded, and the cell pellet was resuspended in 
ice-cold 1 0% glycerol at half of the original volume. The cells were pelleted by 
centrifiigation for 6 minutes at 7,500 g. The supernatant was again discarded, and cells 

1 5 resuspended in ice-cold 1 0% glycerol at one quarter of the original volume. The last 
centrifiigation and resuspension steps were repeated, followed by centrifiigation for 6 
minutes at 7,500 g. The supernatant was decanted, and the cells resuspended in the small 
volume of glycerol that did not drain out. Additional ice-cold 10% glycerol was added to 
resuspend the cells, if necessary. Forty pL of the resuspended cells was used in a test 

20 electroporation to determine if the cells needed to be concentrated by centrifiigation or 
diluted with 10% ice-cold glycerol. Time constants of 8.5-9.0 milliseconds resulted in 
good transformation efficiencies. If cells were too dilute, the time constant was greater 
than 9.0 and transformation efficiencies were low. If cells were too concentrated, the 
electroporation would spark. Once an acceptable time constant was achieved, cells were 

25 aliquoted into cold microfiige tubes and stored at -80°C. All water used for media and 
glycerol was 1 8.2 Mohm-cm or higher. 

Electrocompetent R. sphaeroides cells were electroporated as follows. One \\L of 
plasmid DNA was gently mixed into 40 |iL of R. sphaeroides electrocompetent cells, 
which were then transferred to an electroporation cuvette with a 0.2 cm electrode gap. 

30 Electroporations were conducted using a Biorad Gene Pulser II (Biorad, Hercules, CA) 
with settings at 2.5 kV of energy, 400 ohms of resistance, and 25 |iP of capacitance. Cells 
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were recovered in 400 \xL SOC media at 30°C for 6-16 hours. The cells were then plated 
(200 pL per plate) on the appropriate selective media. Transformation efficiencies 
averaged about 2,000 transformants/pg of DNA. 

5 Electrocompetent E. coli cells 

Electrocompetent E. coli strain S17-1 cells were made as follows. A 5 mL culture 
of £. coli strain SI 7-1 was grown overnight at 30°C in LB media supplemented with 25 
jig/mL of streptomycin and 25 |ig/mL of spectinomycin. This culture was diluted 1/1 00 
in 300 mL of the same media and grown to an ODeeo of 0.5-0.8. The cells were chilled on 

1 0 ice for 1 0 minutes and then centrifuged for 6 minutes at 7,500 g. The supernatant was 
discarded, and the cell pellet was resuspended in ice-cold 10% glycerol at half of the 
original volume. The cells were pelleted by centrifugation for 6 minutes at 7,500 g. The 
supernatant was again discarded, and the cells were resuspended in ice-cold 10% glycerol 
at one quarter of the original volume. The last centrifugation and resuspension steps were 

15 repeated, followed by centrifugation for 6 minutes at 7,500 g. The supernatant was 

decanted, and the cells resuspended in the small volume of glycerol that did not drain out. 
Additional ice-cold 10% glycerol was added to resuspend the cells, if necessary. Cells 
were aliquoted into cold microfuge tubes and stored at -80°C. 

Electrocompetent E. coli strain SI 7-1 cells were electroporated as follows. Forty 

20 jiL of competent cells was used per electroporation. Electroporation was conducted using 
a Biorad Gene Pulser II and a standard E. coli protocol: 2.5 kV of energy, 200 ohms of 
resistance, and 25 \i¥ of capacitance, Electroporated cells were recovered in 250-1000 |iL 
of SOC media for one hour, and 10-200 jiL of culture was plated per plate of selective 
media. Transformation efficiencies averaged about 1.5 x 10 4 transformants/jig of DNA. 

25 

Constructs 

Various clones were overexpressed in R. sphaeroides using the broad-host-range 
vector pBBRlMCS2 (Kovach et al 9 Gene, 166:175-176 (1995)) that was engineered to 
have either an R. sphaeroides rrnB promoter, an R. sphaeroides glnB promoter, or a tet 
30 promoter. The pBBRlMCS2 vector is mobilizable and relatively small (5,144 bp), 
replicates in R. sphaeroides, has a multiple cloning site with lacZa color selection, and 
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carries a kanamycin resistance gene. All restriction enzymes and T4 DNA ligase were 
obtained from New England Biolabs (Beverly, MA) unless otherwise indicated. All 
plasmid DNA preparations were done using QIAprep Spin Miniprep Kits or Qiagen Maxi 
Prep Kits, and all gel purifications were done using QIAquick Gel Extraction Kits 
5 (Qiagen, Valencia, CA). 

pMCS2rrnBP 

The vector designated pMCS2rrnBP, which contains an it sphaeroides rrnB 
promoter, was constructed by inserting a copy of the R. sphaeroides rrnB promoter 

1 0 (rrnBP) into the pBBRl MCS2 vector. The rrnB promoter was isolated from the 

pTEX124 vector (obtained from S. Kaplan) by digestion with the restriction enzyme 
BamHI, which releases the promoter as a 363 bp fragment. Alternatively, the rrnB 
promoter can be obtained by PCR amplifying it from R. sphaeroides genomic DNA using 
primers based on published rrnB sequence (GenBank® accession number X53854). This 

1 5 fragment was gel purified from a 2% Tris-acetate-EDTA (TAB) agarose gel. The 

pBBRlMCS2 vector was also digested with BamHI, and the enzyme heat inactivated at 
80°C for 20 minutes. The digested vector was then dephosphorylated with shrimp 
alkaline phosphatase (Roche Moelcular Biochemicals, Indianapolis, IN) and gel purified 
from a 1% TAE-agarose gel. The prepared vector and the rrnBP fragment were ligated 

20 using T4 DNA ligase at 16°C for 16 hours. One |xL of ligation reaction was used to 
electroporate 40 \sL of E. coli Electromax™ DH10B™ cells (Life Technologies, Inc., 
Rockville, MD). Electroporated cells were plated on LB media containing 25 p,g/mL of 
kanamycin (LBK). Plasmid DNA was isolated from cultures of single colonies and was 
digested with Hindlll restriction enzyme to confirm the presence of a single insertion of 

25 the rrnB promoter. The sequence of the rrnBP inserts for these colonies was also 
confirmed by DNA sequencing. 

t>MCS2glnBP 

The vector designated pMCS2glnBP, which contains an R. sphaeroides glnB 
30 promoter, was constructed by inserting a copy of the R. sphaeroides glnB promoter 
(glnBP) into the pBBRlMCS2 vector. The glnB promoter was PCR amplified from 
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genomic DNA obtained from H sphaeroides strain 35053. The following primers were 
designed based on sequence information obtained from GenBank® accession number 
X71659: 

5 glnBF 5'-ATTATCTAGAATCCGCCCCGCCTCCACCTC-3' (SEQIDNO:140) 
glnBR 5 '-GATGGATCCTGGGTAGGGTCGCTGCTGTCC-3 ' (SEQ ID NO:141) 

The primers introduced an Xbal restriction site at the 5' end and a BamHI 
restriction site at the 3 5 end. The following reaction mix and PCR program was used to 
1 0 amplify the promoter region of the glnB gene. 



Reaction Mix 




PCR program 


Pfu 1 OX buffer 


10 uL 


94°C 2 minutes 


DMSO 


5 uL 


7 cycles of: 


dNTPmix(lOmM) 


4uL 


94°C 30 seconds 


glnBF (50 uM) 


2 uL 


61°C 45 seconds 


glnBP (50 uM) 


2 uL 


72°C 3 minutes 


Genomic DNA (50ng/uL) 


2 uL 


25 cycles of: 


Pfu enzyme (2.5 U/uL) 


2 uL 


94°C 30 seconds 


DI water 


73 uL 


66°C 45 seconds 




72°C 3 minutes 


Total: 


100 uL 


72°C 7 minutes 




4°C Until used further 



25 The PCR product was separated on a 1 .2% TAE-agarose gel An about 500 bp 

fragment was excised and gel purified. The isolated DNA was restricted with Xbal and 
BamHI, and the resulting digested DNA column purified using a Qiagen gel isolation kit. 
Three \ig of pBBRlMCS2 plasmid DNA was digested with BamHI and Xbal. The 
digestion was inactivated at 80°C for 20 minutes. The digested vector was then 

30 dephosphorylated with shrimp alkaline phosphatase and gel purified on a 1 % TAE- 
agarose gel. Eighty-six ng of the prepared pBBRlMCS2 vector was ligated with 60 ng of 
the digested glnBP PCR product using T4 DNA ligase at 14°C for 14-16 hours. One pL 
of ligation reaction was used to electroporate 40 \xL of E. coli Electromax™ DH10B™ 
cells. Electroporated cells were plated on LB media containing 25 pg/mL of kanamycin 

35 and 50 ng/mL of Xgal (LBKX). Eight individual, white colonies were selected, and their 



77 



WO 02/26933 



PCT/US01/30328 



plasmid DNA isolated using a QIAprep Spin Miniprep Kit. Plasmid DNA isolated from 
each colony was digested in separation reaction mixtures with PstI and a combination of 
EcoRI/Xbal. All eight clones had a restriction pattern that indicated the presence of the 
insert. The sequence of three clones was verified. 

5 

pMCS2tetP 

The vector designated pMCS2tetP, which contains a tet promoter, was constructed 
by cloning the promoter for the tetracycline resistance determinants from transposon 
Tnl721 (Waters et aU Nucleic Acids Research, 1 1(17):6089-6105 (1983)) into the 
10 pBBRlMCS2 vector. The tetA gene promoter (tetP) was amplified using plasmid 
pRK415 as template. The following primers were designed to introduce an Xbal 
restriction site at the beginning of the amplified fragment and a BamHI site at the end of 
the amplified fragment. 

15 TETXBAF 5'-TTATCTAGAACCGTCTACGCCGACCTC- 

GTTCAAC-3' (SEQ ID NO:142) 
TETBAMR 5 ' -TTAGGATCCCCTCCGCTGGTCCGATTG- 
AAC-3'(SEQIDNO:143) 

20 The PCR mix contained the following: IX Native Plus Pfu buffer, 20 ng pRK415 

plasmid DNA, 0.2 |iM of each primer, 0.2 mM of each dNTP, 5% DMSO (v/v), and 10 
units of native Pfu DNA polymerase in a final volume of 200 joL. The PCR reaction was 
performed in a Perkin Elmer Geneamp PCR System 2400 under the following conditions: 
an initial denaturation at 94°C for 1 minute; 8 cycles of 94°C for 30 seconds, 60°C for 45 

25 seconds, and 72°C for 45 seconds; 24 cycles of 94°C for 30 seconds, 66°C for 45 
seconds, and 72°C for 45 seconds; and a final extension for 7 minutes at 72°C. The 
amplification product was then separated by gel electrophoresis using a 2 %TAE-agarose 
gel. A 160 bp fragment was excised from the gel and purified. The purified fragment 
was digested simultaneously with Xbal and BamHI restriction enzymes, and purified with 

30 a QIAquick PCR Purification Kit. Three ng of pBBRlMCS2 plasmid DNA was digested 
with BamHI and Xbal, and the digest was inactivated at 80°C for 20 minutes. The 
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digested vector was then dephosphorylated with shrimp alkaline phosphatase and gel 
purified on a 1% TAE-agarose gel. 

100 ng of the prepared pBBRlMCS2 vector was ligated with 36 ng of the digested 
tetP PCR product using T4 DNA ligase at 16°C for 16 hours. One pL of ligation reaction 

5 was used to electroporate 40 pL ofE. coli Electromax™ DH5a™ cells. Electroporated 
cells were plated on LB media containing 25 pg/mL of kanamycin and 50 p,g/mL of Xgal 
(LBKX). Individual, white colonies were resuspended in about 25 pL of 10 mM Tris, 
and 2 pL of the resuspension was plated on LBKX. The remnant resuspension was 
heated for 10 minutes at 95°C to break open the bacterial cells. Two pL of the heated 

1 0 cells was used in a 25 pL PCR reaction using the following primers homologous to the 
vector and flanking the cloning site: 

MCS2FS 5 5 - AGGCGATTAAGTTGGGTAAC-3 5 (SEQIDNO:144) 
MCS2RS 5 ' -GACCATGATTACGCCAAG-3 ' (SEQ ID NO:145) 

15 

The PCR mix contained the following: IX Taq PCR buffer, 0.2 pM each primer, 
0.2 mM each dNTP, and 1 unit of Taq DNA polymerase per reaction. The PCR reaction 
was performed in a MJ Research PTC100 under the following conditions: an initial 
denaturation at 94°C for 2 minutes; 32 cycles of 94°C for 30 seconds, 55°C for 45 
20 seconds, and 72°C for 1 minute; and a final extension for 7 minutes at 72°C. All colonies 
showed a single insertion event. Plasmid DNA was isolated from cultures of two 
individual colonies and sequenced to confirm the DNA sequence of the tet promoter in 
the construct. 

25 nMCS2rrnBP/Stdxs 

The nucleic acid encoding a S. trueperi polypeptide having DXS activity was 

cloned in the pMCS2rrnBP vector as follows. The S. trueperi dxs gene was amplified by 

PCR using primers homologous to sequence upstream and downstream of the gene. 

These primers, STDXSMCSF and STDXSMCSR, were designed to introduce a Clal 
30 restriction site at the beginning of the amplified fragment and a Kpnl site at the end of the 

amplified fragment. 
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STDXSMCSF 5 '-GATAATCGATGTGTGACTGACCTGT- 

CCAAC-3' (SEQ ED NO:146) 
STDXSMCSR 5 ' -CTTAGGTACC ATGTTGG AGATTC AA- 

GGTGG-3'(SEQ ID NO: 147) 

5 

The PCR mix contained the following: IX Native Plus Pfu buffer, 200 ng & 
trueperi genomic DNA, 0.2 \M of each primer, 0.2 mM each dNTP, 5% DMSO (v/v), 
and 10 units of native Pfu DNA polymerase (Stratagene, La Jolla, CA) in a final volume 
of 200 jaL. The PCR reaction was performed in a Perkin Elmer Geneamp PCR System 

10 2400 under the following conditions: an initial denaturation at 94°C for 1 minute; 8 cycles 
of 94°C for 30 seconds, 54°C for 45 seconds, and 72°C for 3.5 minutes; 27 cycles of 
94°C for 30 seconds, 60°C for 45 seconds, and 72°C for 3.5 minutes; and a final 
extension for 7 minutes at 72°C. The amplification product was then separated by gel 
electrophoresis using a 1% TAE-agarose gel. A 2.2 Kb fragment was excised from the 

1 5 gel and purified. The purified fragment was digested with Clal restriction enzyme, 

purified with a QIAquick PCR Purification Kit, digested with Kpnl restriction enzyme, 
purified again with a QIAquick PCR Purification Kit, and quantified on a minigeL 

Three \xg of the pMCS2rrnBP vector was digested with the restriction enzyme 
Clal, gel purified on a 1% TAE-agarose gel, digested with Kpnl, purified with a 

20 QIAquick PCR Purification Kit, dephosphorylated with shrimp alkaline phosphatase, and 
purified again with a QIAquick PCR Purification Kit. 120 ng of the digested PCR 
product containing the S. trueperi dxs gene and the 50 ng of the prepared pMCS2rrnBP 
vector was ligated using T4 DNA ligase at 16°C for 16 hours. One pL of the ligation 
reaction was used to electroporate 40 pL of E. coli Electromax™ DH10B™ cells. The 

25 electroporated cells were plated onto media. Plasmid DNA was isolated from cultures of 
individual colonies and evaluated for the presence of the desired insert by restriction 
enzyme analysis with Hindlll and Sad enzymes. The sequence of the Stdxs insert was 
confirmed by DNA sequencing. The resulting plasmid containing the Stdxs sequence 
under the control of the rrnB promotor was designated pMCS2rrnBP/Stdxs. 

30 Purified pMCS2rrnBP/Stdxs plasmid DNA derived from a colony having the 

correct sequence was then electroporated into electrocompetent cells of R. sphaeroides 
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strain 35053. Plasmid DNA was isolated from cultures of individual R. sphaeroides 
colonies. Restriction patterns of plasmid preparations from R. sphaeroides are difficult to 
analyze due to the presence of multiple native plasmids in this species. To check the 
plasmid integrity in R. sphaeroides, one pL of the plasmid preparation from a transformed 
5 R. sphaeroides colony was used to re-tranform E. coli Electromax™ DH10B™ cells by 
electroporation. Electroporated cells were plated on LBK media. Plasmid DNA was 
isolated from cultures of individual colonies and evaluated using Sad and Hindlll 
restriction digests. 



10 pMCS2rrnBP/Stdxs2 

A second pMCS2rrnBP plasmid containing the nucleic acid encoding a & trueperi 
polypeptide having DXS activity was constructed. This construct was made using the 
following forward primer designed to introduce the ribosomal binding site (rbs) from the 
R. sphaeroides dxsl gene along with a Clal restriction site. 

15 

SXSCLAF2 5 1 - ACTATCGATGAAGGAAGAGC ATGGCTGACCT- 
ACCCAAGAC-3' (SEQ IDNO:146) 



S. trueperi genomic DNA was used as template in a PCR mixture using the 
20 primers SXSCLAF2 and STDXSMCSR. The PCR program and reaction mixture used 
were identical to those described for the pMCS2rrnBP/Stdxs construct. The PCR product 
was gel purified, digested with Clal, purified with a QIAquick PCR Purification Kit, 
digested vrith restriction enzyme Kpnl, and purified again with a QIAquick PCR 
Purification Kit. 150 ng of digested PCR product was ligated into 50 ng of the prepared 
25 pMCS2rrnBP vector using T4 DNA ligase at 16°C for 16 hours. One \xL of the ligation 
reaction was transformed into E. coli Electromax™ DH10B™ cells, and the 
electroporated cells were plated onto LBK plates, Plasmid DNA was isolated from 
cultures of individual colonies and evaluated for the presence of the desired insert by 
restriction enzyme analysis with Hindlll and SacI enzymes. The sequence of the dxs 
30 insert was confirmed by DNA sequencing. The resulting plasmid containing the Stdxs 
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sequence under the control of the rrnB promoter and having an R. sphaeroides ribosomal 
binding site was designated pMCS2rrnBP/Stdxs2. 

A confirmed construct was electroporated into R. sphaeroides strain 35053, and 
the electroporated cells were plated onto LBK media. Individual colonies were 
5 resuspended in about 25 pL of 10 mM Tris, and 2 |iL of the resuspension was plated on 
LBK media. The remnant resuspension was heated for 10 minutes at 95°C to break open 
the bacterial cells, and two \iL of the heated cells used in a 25 PCR reaction using the 
SXSCLAF2 and STDXSMCSR primers. The PCR mix contained the following: IX Taq 
PCR buffer, 0.2 ^iM each primer, 0.2 mM each dNTP, 5% DMSO (v/v), and 1 unit of Taq 
1 0 DNA polymerase (Roche) per reaction. The PCR reaction was performed in a M J 

Research PTC 100 under the following conditions: an initial denaturation at 94°C for 2 
minutes; 8 cycles of 94°C for 30 seconds, 54°C for 1 minute, and 72°C for 3.5 minutes; 
24 cycles of 94°C for 30 seconds, 60°C 1 minute, and 72°C for 3.5 minutes; and a final 
extension for 7 minutes at 72°C. 

15 

pMCS2rrnBP/Rsdds 

The nucleic acid encoding a R. sphaeroides polypeptide having DDS activity was 
cloned in the pMCS2rmBP vector as follows. The R. sphaeroides dds gene was PCR 
amplified using the following primer pair: 

20 

RDS18F 5 ' - ACTAG AATTCCGC AACAGTTCCTTC ATGTC-3 5 (SEQ ID NO: 147) 
RSDDSMCSR 5'-CTAGATCGATACTTGCGGTCGGACTGATAG-3 5 (SEQ ID 
NO: 148) 

25 The forward primer was located upstream of the start codon and introduced an 

EcoRI restriction site, while the reverse primer was located downstream of the stop codon 
and introduced a Clal restriction site. Since the forward primer was located upstream, the 
R. sphaeroides dds maintained its native ribosomal binding site. The following reaction 
mix and PCR program were used to amplify the J?, sphaeroides dds gene. 

30 
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Reaction Mix Program 



Pfu 10X buffer 


10 uL 


94°C 2 minutes 


DMSO 


5 uL 


8 cycles of: 


dNTPmix(lOmM) 


4uL 


94°C 30 seconds 


5 RDS18F(50uM) 


2uL 


55°C 45 seconds 


RSDDSMCSR(50uM) 


2uL 


72°C 3 minutes 


Genomic DNA (50 ng/uL) 


2 uL 


21 cycles of: 


Pfu enzyme (2.5 U/uL) 


1 uL 


94°C 30 seconds 


DI water 


74 uL 


61°C 45 seconds 


10 




72°C 3 minutes 


Total: 


100 uL 


72°C 7 minutes 



4°C Until used further 



The PCR product was separated on a 1% TAE-agarose gel, and an about 1.8 Kb 

1 5 fragment was excised and gel purified. The isolated DNA was restricted with EcoRI and 
Clal, and was column purified using a Qiagen gel isolation kit. Three jxg of pMCS2rrnBP 
vector DNA was digested with EcoRI, and the linear DNA was gel isolated using a 
Qiagen gel isolation kit. The vector was further digested with Clal, and the DNA was 
column purified. The double-digested vector was then dephosphorylated with shrimp 

20 alkaline phosphatase and purified using a QIAquick PCR Purification Kit. The 

EcoRI/Clal-digested R. sphaeroides dds PCR product was ligated into the prepared vector 
using T4 DNA ligase for 14-16 hours at 16°C. One pL of the ligation reaction was 
transformed into E. coli Electromax™ DH10B™ cells, which were then plated on LBK 
(25 n-g/mL) media. Individual colonies were resuspended in about 25 fiL of DI water, and 

25 2 jj.L of the resuspension was plated on LBK. The remnant resuspension was heated for 
10 minutes at 95°C to break open the bacterial cells, and 2 \iL of the heated cells was used 
in a 25 \iL PCR reaction using the RDS 1 8F and RSDDSMCSR primers. The PCR mix 
contained the following: IX Taq PCR buffer, 0.2 pM each primer, 0.2 mM each dNTP, 
5% DMSO (v/v), and 1 unit of Taq DNA polymerase per reaction. The PCR reaction was 

30 performed under the following conditions: an initial denaturation at 94°C for 2 minutes; 6 
cycles of 94°C for 30 seconds, 55°C for 45 seconds, and 72°C for 3 minutes; 25 cycles of 
94°C for 30 seconds, 61°C 45 seconds, and 72°C for 3 minutes; and a final extension for 
7 minutes at 72°C. The resulting plasmid containing the Rsdds sequence under the 
control of the rrnB promoter was designated pMCS2rrnBP/Rsdds. 
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The pMCS2rrnBP/Rsdds plasmid was electroporated into E. coli strain SI 7-1. 
This strain contains a chromosomal copy of the trans-acting elements that mobilize oriT- 
containing plasmids during conjugation with a second bacterial strain. It also carries a 
gene conferring resistance to the antibiotics streptomycin and spectinomycin. 
5 Using the SI 7-1 strain, the pMCS2rrnBP/Rsdds plasmid was transferred to R. 

sphaeroides 35053 by conjugation. Individual colonies were purified by restreaking on 
LBK plates. Single colonies were screened by PCR using the RDS1 8F and 
RSDDSMCSR primers to confirm the presence of the insert as described above. 

10 nMCS2rrnBP/Stdds 

The nucleic acid encoding a S. trueperi polypeptide having DDS activity was 
cloned in the pMCS2rrnBP vector as follows. The S. trueperi dds gene was PCR 
amplified using the following primer pair: 

15 STDDSMCSF 5 '-GTCGCTCGAGATCAGATAATCGTCGCTCAA-3 ' (SEQID 
NO: 149) 

STDDSMCSR 5 5 -ATATGGTACCGACATGGACGAGGAAGACGC-3' (SEQID 
NO: 150) 



20 



25 



30 



The forward primer was located upstream of the start codon and introduced a 
Xhol restriction site, while the reverse primer was located downstream of the stop codon 
and introduced a Kpnl restriction site. Since the forward primer was located upstream, 
the S. trueperi dds fragment maintained its native ribosomal binding site. The following 
reaction mix and PCR program were used to amplify the S. frueperi dds gene. 



Reaction Mix 



PfulOX buffer 10 pL 

DMSO 5 pL 

dNTPmix(lOmM) 4 pL 

SHDDSMCSF (50 pM) 2 yL 

SHDDSMCSR (50 \M) 2 \xL 

Genomic DNA (50 ng/jiL) 2 pL 

Pfu enzyme (2.5 U/jtL) 1 pL 

DI water 74 \iL 



Program 



94°C 2 minutes 

8 cycles of: 

94°C 30 seconds 

55°C 45 seconds 

72°C 3 minutes 

21 cycles of: 

94°C 30 seconds 

61°C 45 seconds 
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72°C 3 minutes 
Total: 100 pL 72°C 7 minutes 

4°C Until used further 

5 The PCR product was separated on a 1% TAE-agarose gel, and an about 1.6 Kb 

fragment was excised. The DNA was isolated using a Qiagen gel isolation kit The 
isolated DNA was restricted with Xhol and Kpnl, and was column purified using a 
Qiagen gel isolation kit. Two jig of pMCS2rrnBP vector DNA was digested with Kpnl, 
and the linear DNA was gel isolated using a Qiagen gel isolation kit. The vector was 

1 0 further digested with Xhol, and the DNA was column purified. The double-digested 

vector was then dephosphorylated with shrimp alkaline phosphatase and column purified 
using a Qiagen gel purification kit. The XhoI/KpnI-digested S. trueperi dds PCR product 
was ligated into the prepared vector using T4 DNA ligase for 14-16 hours at 16°C. One 
[iL of the ligation reaction was transformed into E. coli Electromax™ DH10B™ cells, 

1 5 which were then plated on LBK (25 jig/mL) media. Individual colonies were 

resuspended in about 25 \iL of DI water, and 2 of the resuspension was plated on 
LBK. The remnant resuspension was heated for 10 minutes at 95°C to break open the 
bacterial cells, and 2 |xL of the heated cells was used in a 25 PCR reaction using the 
SHDDSMCSF and SHDDSMCSR primers. The PCR mix contained the following: IX 

20 Taq PCR buffer, 0.2 jiM each primer, 0.2 mM each dNTP, 5% DMSO (v/v), and 1 unit of 
Taq DNA polymerase per reaction. The PCR reaction was performed under the following 
conditions: an initial denaturation at 94°C for 2 minutes; 6 cycles of 94°C for 30 seconds, 
55°C for 45 seconds, and 72°C for 3 minutes; 25 cycles of 94°C for 30 seconds, 61°C 45 
seconds, and 72°C for 3 minutes; and a final extension for 7 minutes at 72°C. The 

25 resulting plasmid containing the Stdds sequence under the control of the rrnB promotor 
was designated pMCS2rrnBP/Stdds. 

The pMCS2rrnBP/Stdds plasmid was electroporated into E. coli strain SI 7-1. 
Using the SI 7-1 strain, the pMCS2rrnBP/Stdds plasmid was transferred to R. sphaeroides 
35053 by conjugation. Individual colonies were purified by restreaking on LBK plates. 

30 Single colonies were screened by PCR using the SHDDSMCSF and SHDDSMCSR 
primers to confirm the presence of the insert as described above. 
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D MCS2g1nBP/Rsdds 

The nucleic acid encoding a R. sphaeroides polypeptide having DDS activity was 
cloned in the pMCS2glnBP vector as follows. The R. sphaeroides dds gene was PCR 
amplified using the following primer pair. 

5 

RSDDSF 5 ' -TAGAGAATTCGAAGGAAGAGCATGGGATTGGACG- 

AGGTTTC-3' (SEQ IDNO:151) 
RSDDSR 5 ' -TACTACTTGTATGTAGGTACCACTTGCGGTCGGAC- 

TGATAG-3' (SEQ IDNO:152) 

10 

The forward primer introduced an EcoRI restriction site and a ribosomal binding 
site that was designed based on R. sphaeroides dxsl gene. The reverse primer introduced 
a Kpnl restriction site. Following reaction mix and PCR program was used to amplify the 
R sphaeroides dds gene. 



15 

Reaction Mix Program 



Pfu 1 OX buffer 


10 uL 


94°C 2 minutes 


DMSO 


5uL 


7 cycles of: 


dNTP mix (10 mM) 


3uL 


94°C 30 seconds 


20 RSDDSF (100 uM) 


1 uL 


55°C 45 seconds 


RSDDSR (100 uM) 


1 uL 


72°C 3 minutes 


Genomic DNA (50 ng/uL) 


2 uL 


25 cycles of: 


Pfu enzyme (2.5 U/uL) 


2 uL 


94°C 30 seconds 


DI water 


76 uL 


62°C 45 seconds 


25 




72°C 3 minutes 


Total: 


100 uL 


72°C 7 minutes 



4°C Until used further 



The PCR product was separated on a 1% TAE-agarose gel, and a fragment about 
30 1.6 Kb in size was excised. The excised DNA was isolated using a Qiagen gel isolation 
kit. The isolated DNA was restricted with EcoRI and Kpnl and was column purified 
using a Qiagen gel isolation kit. Three |ig of pMCS2glnBP vector DNA was digested with 
Kpnl, and the linear DNA was gel isolated using a Qiagen gel isolation kit. The vector 
was further digested with EcoRI, and the DNA was column purified. The double- 
35 digested vector was then dephosphorylated with shrimp alkaline phosphatase and column 
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purified using a Qiagen gel purification kit. The KpnI/EcoRI-digested R. sphaeroides dds 
PCR product with the R sphaeroides dxsl ribosomal binding site described above was 
ligated into the prepared vector using T4 DNA ligase for 14-16 hours at 16°C. One \iL of 
the ligation reaction was transformed into R coli Electromax™ DH10B™ cells, which 

5 were then plated on LBK (25 ng/mL) media. Individual colonies were resuspended in 
about 25 |jL of DI water, and 2 \xL of the resuspension was plated on LBK. The remnant 
resuspension was heated for 10 minutes at 95 °C to break open the bacterial cells, and 2 
pL of the heated cells was used in a 25 \iL PCR reaction using the glnBF and RSDDSR 
primers. The PCR mix contained the following: IX Taq PCR buffer, 0.2 [M each primer, 

10 0.2 mM each dNTP, 5% DMSO (v/v), and 1 unit of Taq DNA polymerase per reaction. 
The PCR reaction was performed under the following conditions: an initial denaturation 
at 94°C for 2 minutes; 6 cycles of 94°C for 30 seconds, 55°C for 45 seconds, and 72°C 
for 3 minutes; 25 cycles of 94°C for 30 seconds, 62°C 45 seconds, and 72°C for 3 
minutes; and a final extension for 7 minutes at 72°C. A large scale plasmid preparation 

15 was done on a culture of a colony containing the Rsdds PCR product, and the 

glnBP/Rsdds region was sequenced to confirm the lack of nucleotide errors. The 
resulting plasmid containing the Rsdds sequence under the control of the glnB promotor 
was designated pMCS2glnBP/Rsdds. 

The pMCS2glnBP/Rsdds plasmid DNA was electroporated into electrocompetent 

20 R. sphaeroides strain 35053 cells as well as electrocompetent carotenoid-deficient mutant 
cells of 35053 (ATCC 35053/AcrtE). Individual colonies of both strains were screened 
by PCR using the glnBF and RSDDSR primers to confirm the presence of the insert as 
described above. 



25 pMCS2glnBP/Stdds 

The nucleic acid encoding a S. trueperi polypeptide having DDS activity was 
cloned in the pMCS2glnBP vector as follows. The S. trueperi dds gene was PCR 
amplified using the following primer pair. 



30 SHDDSECOVF 5 '-GCGTGATATCGAAGGAAGAGCATGAGCGC- 

AACCGTCCACCG-3 9 (SEQ ID NO: 1 53) 
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SHDDSKPNR 5 ' -ACTGCTAGGGTCCGAGGTACCGACATGGACGA- 
GGAAGACGC-3' (SEQ ID NO: 154) 

The forward primer introduced an EcoRV restriction site and a ribosomal binding 
5 site that was designed based on the R. sphaeroides dxs 1 gene. The reverse primer 
introduced a Kpnl restriction site. The following reaction mix and PCR program were 
used to amplify the S. trueperi dds gene. 



Reaction Mix Program 



10 Pful OX buffer 


10 uL 


94°C 2 minutes 


DMSO 


5 uL 


7 cycles of: 


dNTPmix(lOmM) 


3 uL 


94°C 30 seconds 


SHDDSECOVF (100 uM) 


1 uL 


58°C 45 seconds 


SHDDSKPNR (1 00 uM) 


1 uL 


72°C 3 minutes 


1 5 Genomic DNA (50 ng/uL) 


2uL 


25 cycles of: 


Pfu enzyme (2.5 U/uL) 


2 uL 


94°C 30 seconds 


DI water 


76 uL 


65°C 45 seconds 






72°C 3 minutes 


Total: 


100 uL 


72°C 7 minutes 



20 4°C Until used further 



The PCR product was separated on a 1% TAE-agarose gel, and a fragment about 
1.2 Kb in size was excised. The excised DNA was isolated using a Qiagen gel isolation 
kit. The isolated DNA was restricted with EcoRV and Kpnl and was column purified 

25 using a Qiagen gel isolation kit. Three |xg of pMCS2glnBP vector DNA was digested 
with Kpnl, and the linear DNA was gel isolated using a Qiagen gel isolation kit The 
vector was further digested with EcoRV, and the DNA was column purified. The double- 
digested vector was then dephosphorylated with shrimp alkaline phosphatase and column 
purified using a Qiagen gel purification kit. The KpnI/EcoRV-digested £ trueperi dds 

30 PCR product with the R. sphaeroides dxsl ribosomal binding site was ligated into the 
prepared vector using T4 DNA ligase for 14-16 hours at 16°C. One (xL of the ligation 
reaction was transformed into E. coli Electromax™ DH10B™ cells, which were plated on 
LBK (25 jig/mL) media. Individual colonies were resuspended in about 25 |xL of DI 
water, and 2 jxL of the resuspension was plated on LBK. The remnant resuspension was 

35 heated for 1 0 minutes at 95°C to break open the bacterial cells, and 2 pL of the heated 
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cells was used in a 25 ^iL PCR reaction using the glnBF and RSDDSR primers. The PCR 
mix contained the following: IX Taq PCR buffer, 0.2 \iM each primer, 0.2 mM each 
dNTP, 5% DMSO (v/v), and 1 unit of Taq DNA polymerase per reaction. The PCR 
reaction was performed under the following conditions: an initial denaturation at 94°C for 

5 2 minutes; 6 cycles of 94°C for 30 seconds, 58°C for 45 seconds, and 72°C for 3 minutes; 
25 cycles of 94°C for 30 seconds, 65°C 45 seconds, and 72°C for 3 minutes; and a final 
extension for 7 minutes at 72°C. A large scale plasmid preparation was done on a culture 
of a colony containing the Stdds PCR product, and the glnBP/Stdds region was sequenced 
to confirm the lack of nucleotide errors. The resulting plasmid containing the Stdds 

1 0 sequence under the control of the glnB promotor was designated pMCS2glnBP/Stdds. 

The pMCS2glnBP/Stdds plasmid DNA was electroporated into electrocompetent 
cells of R. sphaeroides strain 35053 and a carotenoid-deficient mutant of 35053 (ATCC 
35053/AcrtE). Individual colonies of both strains were screened by PCR using the glnBF 
and SHDDSKPNR primers to confirm the presence of the insert as described above. 

15 

pMCS2tetP/Stdxs 

The nucleic acid encoding a S. trueperi polypeptide having DXS activity was 
cloned in the pMCS2tetP vector as follows. The pMCS2tetP plasmid DNA was digested 
with the restriction enzyme Kpnl, cleaned with a QIAquick PCR Purification Kit, and 

20 digested with the restriction enzyme Clal. The enzyme reactions were inactivated by 
heating at 65°C for 20 minutes. The digested vector DNA was then dephosphorylated 
with shrimp alkaline phosphatase and gel purified on a 1% TAE-agarose gel. The 
Kpnl/Clal-digested S. trueperi dxs PCR product described above with the R. sphaeroides 
dxsl ribosomal binding site was ligated into the prepared vector using T4 DNA ligase for 

25 16 hours at 16°C. One \xL of the ligation reaction was transformed into E. coli 

Electromax™ DHSa™ cells, which were plated on LBK media. Individual colonies were 
resuspended in about 25 of 10 mM Tris, and 2 pL of the resuspension was plated on 
LBK. The remnant resuspension was heated for 10 minutes at 95°C to break open the 
bacterial cells, and 2 |xL of the heated cells was used in a 25 pL PCR reaction using the 

30 SXSCLAF2 and SHDXSMCSR primers. The PCR mix contained the following: IX Taq 
PCR buffer, 0.2 \M each primer, 0.2 mM each dNTP, 5% DMSO (v/v), and 1 unit of 
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Taq DNA polymerase per reaction. The PCR reaction was performed in a MJ Research 
PTC 100 under the following conditions: an initial denaturation at 94°C for 2 minutes; 8 
cycles of 94°C for 30 seconds, 54°C for 1 minute, and 72°C for 3.5 minutes; 24 cycles of 
94°C for 30 seconds, 60°C 1 minute, and 72°C for 3.5 minutes; and a final extension for 7 
5 minutes at 72°C. A large scale plasmid preparation was done on a culture of a colony 
containing the S. trueperi dxs PCR product, and the tetP/Stdxs region was sequenced to 
confirm the lack of nucleotide errors. The resulting plasmid containing the Stdxs 
sequence under the control of the tet promotor was designated pMCS2tetP/Stdxs. 

Plasmid DNA (pMCS2tetP/Stdxs) was electroporated into electrocompetent cells 
10 of R. sphaeroides strain 35053 and a carotenoid-deficient mutant of 35053 (ATCC 
35053/AcrtE). Individual colonies of both strains, along with an E. coli control, were 
screened by PCR using the TETXBAF and STDXSMCSR primers to confirm the 
presence of the insert as described above. 

15 pMCS2tetP/Rsdds 

The nucleic acid encoding a R. sphaeroides polypeptide having DDS activity was 
cloned in the pMCS2tetP vector as follows. Three \xg of plasmid DNA of the pMCS2tetP 
vector was digested with the restriction enzyme KpnI. The digested DNA was cleaned 
with a QIAquick PCR Purification Kit and digested with the restriction enzyme EcoRI, 

20 after which the enzyme was inactivated by heating at 65°C for 20 minutes. The digested 
vector DNA was then dephosphorylated with shrimp alkaline phosphatase and gel 
purified. Sixty ng of vector DNA was ligated with 120 ng of the KpnI/EcoR I-digested R. 
sphaeroides dds PCR product described above using T4 DNA ligase at 16°C for 16 hours. 
One \xL of the ligation reaction was transformed into E. coli Electromax™ DH5a™, 

25 which were then plated on LBK media. Individual colonies were resuspended in about 25 
\iL of 10 mM Tris, and 2 |xL of the resuspension was plated on LBK. The remnant 
resuspension was heated for 10 minutes at 95°C to break open the bacterial cells, and 2 
\xL of the heated ceils used in a 25 \iL PCR reaction using the TETXBAF and 
RSDDSMCSR primers. The PCR mix contained the following: IX Taq PCR buffer, 0.2 

30 nM each primer, 0.2 mM each dNTP, 5% DMSO (v/v), and 1 unit of Taq DNA 

polymerase per reaction. The PCR reaction was performed in a MJ Research PTC 100 
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under the following conditions: an initial denaturation at 94°C for 2 minutes; 8 cycles of 
94°C for 30 seconds, 55°C for 1 minute, and 72°C for 3 minutes; 24 cycles of 94°C for 30 
seconds, 64°C 1 minute, and 72°C for 3 minutes; and a final extension for 7 minutes at 
72°C. Plasmid DNA was isolated for a colony having the desired insert, and the 
5 tetP/Rsdds region was sequenced to confirm the lack of nucleotide errors from PCR. The 
resulting plasmid containing the Rsdds sequence under the control of the tet promotor 
was designated pMCS2tetP/Rsdds. 

Plasmid DNA (pMCS2tetP/Rsdds) was electroporated into electrocompetent 
cells of J?, sphaeroides strain 35053 and the ATCC 35053/AcrtE strain. Individual 
10 colonies of both strains, along with an £. coli control, were screened by PCR using the 
TETXBAF and RSDDSMCSR primers to confirm the presence of the insert as described 
above. 

pMCS2tetP/Stdds 

15 The nucleic acid encoding a S. trueperi polypeptide having DDS activity was 

cloned in the pMCS2tetP vector as follows. Three jxg of pMCS2tetP plasmid DNA was 
digested with the restriction enzyme KpnI. The digested DNA was gel purified and 
digested with the restriction enzyme EcoRV. The enzyme was then inactivated by 
heating at 80°C for 20 minutes, and the DNA dephosphorylated with shrimp alkaline 

20 phosphatase. The dephosphorylated DNA was purified using a QIAquick PCR 
purification kit Fifty jig of digested vector DNA was ligated with 1 50 ng of the 
KpnI/EcoRV-digested S. trueperi dds PCR product described above using T4 DNA ligase 
at 16°C for 16 hours. One jxL of the ligation reaction was transformed into E. coli 
Electromax™ DH10B™ cells, which were then plated on LBK media. Individual 

25 colonies were resuspended in about 25 \xL of 10 mM Tris, and 2 of the resuspension 
was plated on LBK. The remnant resuspension was heated for 10 minutes at 95°C to 
break open the bacterial cells, and 2 pL of the heated cells used in a 25 pL PCR reaction 
using the TETXBAF and STDDSMCSR primers. The PCR mix contained the following: 
IX Taq PCR buffer, 0.2 pM each primer, 0.2 mM each dNTP, 5% DMSO (v/v), and 1 

3 0 unit of Taq DNA polymerase per reaction. The PCR reaction was performed in a MJ 
Research PTC100 under the following conditions: an initial denaturation at 94°C for 2 
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minutes; 8 cycles of 94°C for 30 seconds, 55°C for 1 minute, and 72°C for 3 minutes; 24 
cycles of 94°C for 30 seconds, 64°C for 1 minute, and 72°C for 3 minutes; and a final 
extension for 7 minutes at 72°C. Plasmid DNA was isolated for a colony having the 
desired insert and was sequenced in the tetP/Stdds region to confirm the DNA sequence 
5 of the insert. The resulting plasmid containing the Stdds sequence under the control of 
the tet promotor was designated pMCS2tetP/Stdds. 

Plasmid DNA (pMCS2tetP/Stdds) was electroporated into electrocompetent cells 
of if. sphaeroides strain 35053 and the ATCC 35053/AcrtE strain. Individual colonies of 
both strains, along with an E. coli control, were screened by PCR using the TETXBAF 
10 and STDDSMCSR primers to confirm the presence of the insert as described above. 

pMCS2tetP/Stdxs/Rsdds 

Nucleic acid encoding a £ trueperi polypeptide having DXS activity as well as 
nucleic acid encoding a R. sphaeroides polypeptide having DDS activity was cloned into 

1 5 the pMCS2tetP vector as follows. A vector containing both the S. trueperi dxs gene and 
the R. sphaeroides dds gene, each behind a tet promoter, was constructed using the 
pMCS2tetP/Stdxs construct described above as the starting vector. This vector was 
digested with restriction enzyme Xbal, cleaned with a QIAquick PCR Purification Kit, 
and digested with the restriction enzyme BpulOI (Fermentas, Hanover, MD). The 

20 enzyme reaction was inactivated by heating for 20 minutes at 80°C. The digested vector 
DNA was then dephosphorylated using shrimp alkaline phosphatase and gel purified on a 
1% TAE-agarose gel. 

A PCR product containing a tet promoter region followed by a R. sphaeroides dds 
gene was amplified using the pMCS2tetP/Rsdds construct described above as template. 

25 The PCR mix contained the following: IX Native Plus Pfu buffer, 5 ng plasmid template, 
0.2 |xM each primer, 0.2 mM each dNTP, 5% DMSO (v/v), and 10 units of native Pfu 
DNA polymerase in a final volume of 200 jxL. The PCR reaction was performed in a MJ 
Research PTC 100 under the following conditions: an initial denaturation at 94°C for 2 
minutes; 8 cycles of 94°C for 30 seconds, 55°C for 1 minute, and 72°C for 3 minutes; 24 

30 cycles of 94°C for 30 seconds, 64°C 1 minute, and 72°C for 3 minutes; and a final 

extension for 7 minutes at 72°C. The amplification product was then separated by gel 
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electrophoresis using a 1% TAE-agarose gel. A 1 .6 Kb fragment was excised from the 
gel and purified. The purified fragment was digested with BpulOI, cleaned with a 
QIAquick PCR Purification Kit, digested with Xba I restriction enzyme, purified again 
with a QIAquick PCR Purification Kit, and quantified on a minigel. 

5 60 ng of the prepared pMCS2tetP/Stdxs vector was ligated with 70 ng of the 

digested tetP/Rsdds PCR product using T4 DNA ligase at 1 6°C for 16 hours. One of 
ligation reaction was used to electroporate 40 \iL of E. coli Electromax™ DH5a™ cells. 
Electroporated cells were plated on LBK media. Individual colonies were screened by 
PCR using the RSDDSMCSF and STDXSMCSR primers, which produced a 4. 1 Kb 

10 band. Individual colonies were resuspended in about 25 \xL of 10 mM Tris, and 2 pL of 
the resuspension was plated on LBK. The remnant resuspension was heated for 10 
minutes at 95°C to break open the bacterial cells, and 2 pL of the heated cells used in a 25 
pL PCR reaction. The PCR reaction mix contained 0.2 pM each primer, IX Genome 
Advantage (Clontech, Palo Alto, CA) reaction buffer, 1 M GCMelt, 1.1 mM Mg(OAc) 2 , 

15 0.2 mM each dNTP, and IX Genome Advantage Polymerase. The PCR was conducted in 
a MJ Research PTC100 and consisted of an initial denaturation at 94°C for 1.5 minutes; 
32 cycles of a 30 second denaturation at 94°C, a 1 minute annealing at 60°C, and a 6.5 
minute extension at 72°C; followed by a final extension at 72°C for 5 minutes. A large- 
scale plasmid prep was done for a colony that had the desired insert, and plasmid DNA 

20 was sequenced through the tetP/Rsdds region to confirm the lack of nucleotide errors 

from PCR. The resulting plasmid containing the Stdxs sequence under the control of the 
tet promotor and the Rsdds sequence under the control of the tet promotor was designated 
pMCS2tetP/Stdxs/Rsdds. 

Plasmid DNA (pMCS2tetP/Stdxs/Rsdds) was electroporated into 

25 electrocompetent cells of R. sphaeroides strains 35053 and the ATCC 35053/AcrtE. 

Individual colonies of both strains, along with an E. coli control, were screened by PCR 
using the RSDDSMCSF and STDDSMCSR primers to confirm the presence of the insert 
as described above. 
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pMCS2tetP/Stdxr 

Nucleic acid encoding a S. trueperi polypeptide having DXR activity was cloned 
into the pMCS2tetP vector as follows. The S. trueperi dxr gene was amplified using 
genomic DNA as template. The following primers were designed to introduce an EcoRV 
5 restriction site and a ribosomal binding based on R. sphaeroides dxsl gene at the 

beginning of the amplified fragment and a Kpnl site at the end of the amplified fragment. 

SXRRVF 5 '-GATGATATCGAAGGAAGAGCATGGTGAAGCGCGT- 
CACGGTGT-3' (SEQ IDNO:155) 
10 SXRKPNR 5 ' -CAAGAGTC AG AAGGTACCCGCCAGAATGGTGAGC- 

AGGATG-3' (SEQ ID NO: 156) 

The PCR mix contained the following: IX Native Plus Pfu buffer, 200 ng 
genomic DNA, 0.2 pM of each primer, 0.2 mM of each dNTP, 5% DMSO (v/v), and 10 

1 5 units of native Pfu DNA polymerase in a final volume of 200 jiL. The PCR reaction was 
performed in a MJ Research PTC100 under the following conditions: an initial 
denaturation at 94°C for 2 minutes; 8 cycles of 94°C for 30 seconds, 59°C for 1 minute, 
and 72°C for 3 minutes; 24 cycles of 94°C for 30 seconds, 64°C 1 minute, and 72°C for 3 
minutes; and a final extension for 7 minutes at 72°C. The amplification product was then 

20 separated by gel electrophoresis using a 1 % TAE-agarose gel. A 1 .0 Kb fragment was 
excised from the gel and purified. The purified fragment was digested simultaneously 
with EcoRV and Kpnl restriction enzymes, purified with a QIAquick PCR Purification 
Kit, and checked on a minigel. 

Fifty ng of the EcoRV, KpnI-digested pMCS2tetP vector described above for the 

25 pMCS2tetP/Stdds construct was ligated with 75 ng of the digested 5. trueperi dxr PCR 
product using T4 DNA ligase at 20°C for 4 hours. One jiL of ligation reaction was used 
to electroporate 40 \iL of E. coli Electromax™ DH10B™ cells, which were then plated 
on LBK media. Individual colonies were selected and screened by PCR using the 
TETXBAF and SXRKPNR primers. The PCR mix contained the following: IX Taq PCR 

30 buffer, 200 ng genomic DNA, 0.2 \M of each primer, 0.2 mM of each dNTP, 5% DMSO 
(v/v), and 1 unit of Taq DNA polymerase per 25 \\L reaction. The PCR reaction was 
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performed in a MJ Research PTC100 under the following conditions: an initial 
denaturation at 94°C for 2 minutes; 32 cycles of 94°C for 30 seconds, 64°C 1 minute, and 
72°C for 3 minutes; and a final extension for 7 minutes at 72°C. A large-scale plasmid 
preparation was done for a colony that had the desired insert, and the tetP/Stdxr region 
5 was sequenced to confirm the DNA sequence of the insert. The resulting plasmid 
containing the Stdxr sequence under the control of the tet promoter was designated 
pMCS2tetP/Stdxr. 

Plasmid DNA (pMCS2tetP/Stdxr) was electroporated into electrocompetent cells 
of R. sphaeroides strains 35053 and ATCC 35053/AcrtE. Individual colonies of both 
10 strains, along with an E. coli control, were screened by PCR using the TETXBAF and 
SXRKPNR primers to confirm the presence of the insert as described above. 

P MCS2tetP/Stdxr/Stdds 

Nucleic acid encoding a S. trueperi polypeptide having DXR activity as well as 
1 5 nucleic acid encoding a S. trueperi polypeptide having DDS activity was cloned into the 

pMCS2tetP vector as follows. A vector containing both the S. trueperi dxr and dds genes, 

each behind a tet promoter, was constructed using the pMCS2tetP/Stdds construct 

described above as the starting vector. This vector was digested with restriction enzyme 

Xbal, cleaned with a QIAquick PCR Purification Kit, and digested with the restriction 
20 enzyme Bpul 01 (Fermentas). The enzyme reaction was inactivated by heating for 20 

minutes at 80°C. The digested vector DNA was then dephosphorylated with shrimp 

alkaline phosphatase and gel purified. 

A PCR product containing a tet promoter region followed by a S. trueperi dxr 

gene was amplified using the pMCS2tetP/Stdxr construct described above as template 
25 and primers TETBPUF and SXRXBAR. The SXRXBAR primer, having the following 

sequence, was designed to introduce an Xbal restriction site on the end of the PCR 

product. 

SXRXBAR 5 5 -C AAGAGTCAGAATCTAGACGCCAGAATGGTGA- 
30 GCAGGATG-3' (SEQ ID NO: 157) 
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The PCR mix contained the following: IX Native Plus Pfu buffer, 5 ng plasmid 
template, 0.2 \M each primer, 0.2 mM each dNTP, 5% DMSO (v/v), and 10 units of 
native Pfu DNA polymerase in a final volume of 200 pL. The PCR reaction was 
performed in a MJ Research PTC 100 under the following conditions: an initial 
5 denaturation at 94°C for 2 minutes; 8 cycles of 94°C for 30 seconds, 59°C for 1 minute, 
and 72°C for 3.5 minutes; 24 cycles of 94°C for 30 seconds, 64°C 1 minute, and 72°C for 
3.5 minutes; and a final extension for 7 minutes at 72°C. The amplification product was 
then separated by gel electrophoresis using a 1% TAE-agarose gel. A 1 .4 Kb fragment 
was excised from the gel and purified. The purified fragment was digested with BpulOI, 

1 0 cleaned with a QIAquick PCR Purification Kit, digested with Xbal restriction enzyme, 
purified again with a QIAquick PCR Purification Kit, and quantified on a minigel. 

Sixty ng of the prepared pMCS2tetP/Stdds vector was ligated with 80 ng of the 
digested tetP/Stdxr PCR product using T4 DNA ligase at 16°C for 16 hours. One \\L of 
ligation reaction was used to electroporate 40 \xL of E. coli Electromax™ DHIOB™ cells, 

1 5 which were then plated on LBK media. Individual colonies were screened by PCR using 
the SXREVF and SDSKPNR primers. Colonies were resuspended in about 25 jxL of 10 
mM Tris, and 2 (xL of the resuspension was plated on LBK media. The remnant 
resuspension was heated for 10 minutes at 95°C to break open the bacterial cells, and 2 
jiL of the heated cells used in a 25 \iL PCR reaction. The PCR mix contained the 

20 following: IX Taq PCR buffer, 0.2 \M each primer, 0.2 mM each dNTP, 5% DMSO 

(v/v), and 1 unit of Taq DNA polymerase per reaction. The PCR reaction was performed 
in a MJ Research PTC 100 under the following conditions: an initial denaturation at 94°C 
for 2 minutes; 8 cycles of 94°C for 30 seconds, 58°C for 1 minute, and 72°C for 4.5 
minutes; 24 cycles of 94°C for 30 seconds, 64°C 1 minute, and 72°C for 4.5 minutes; and 

25 a final extension for 7 minutes at 72°C. A large-scale plasmid preparation was done for a 
colony that had the desired insert, and the tetP/Stdxr region was sequenced to confirm the 
lack of nucleotide errors from PCR. The resulting plasmid containing the Stdxr sequence 
under the control of the tet promotor and the Stdds sequence under the control of the tet 
promotor was designated pMCS2tetP/Stdxr/Stdds. 

30 Plasmid DNA (pMCS2tetP/Stdxr/Stdds) was electroporated into electrocompetent 

cells of R, sphaeroides strains 35053 and ATCC 35053/AcrtE. Individual colonies of 
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both strains, along with an E. coli control, were screened by PCR using the SXREVF and 
SDSKPNR primers to confirm the presence of the insert as described above. 

pMCS2tetP/EcUbiC 

5 Nucleic acid encoding a £. coli polypeptide having chorismate lyase activity was 

cloned into the pMCS2tetP vector as follows. The £, coli ubiC gene was amplified using 
genomic DNA from E. coli strain DH10B as template. The following primers were 
designed to introduce an EcoRV restriction site and a ribosomal binding site based on R. 
sphaeroides dxsl gene at the beginning of the amplified fragment, and a Kpnl site at the 
1 0 end of the amplified fragment. 

UBICRVF 5 ' -CTAG ATATCGGAAGG AAGAGC ATGTC AC AC- 

CCCGCGTTA-3 > (SEQ IDNO:158) 
UBICKPNR 5'-TCAGGTACCGTGTCGCCACCCACAACGCC- 
15 CATAATG-3 5 (SEQ IDNO:159) 

The PCR mix contained the following: IX Native Plus Pfu buffer, 200 ng 
genomic DNA, 0.2 \xM each primer, 0.2 mM each dNTP, and 10 units of native Pfu DNA 
polymerase in a final volume of 200 pL. The PCR reaction was performed in a MJ 

20 Research PTC100 under the following conditions: an initial denaturation at 94°C for 2 
minutes; 8 cycles of 94°C for 30 seconds, 57°C for 1 minute, and 72°C for 2.5 minutes; 
24 cycles of 94°C for 30 seconds, 64°C 1 minute, and 72°C fox 2.5 minutes; and a final 
extension for 7 minutes at 72°C. The amplification product was then separated by gel 
electrophoresis using a 1.5 % TAE-agarose gel. A 650 bp fragment was excised from the 

25 gel and purified. The purified fragment was digested with EcoRV, cleaned with a 

QIAquick PCR Purification Kit, digested with Kpnl restriction enzyme, purified again 
with a QIAquick PCR Purification Kit, and quantified on a minigel. 

Seventy-five ng of the EcoRV, Kpnl-digested pMCS2tetP vector described above 
for the pMCS2tetP/Stdds construct was ligated with 70 ng of the digested ubiC PCR 

30 product using T4 DNA ligase at 16°C for 16 hours. One \iL of ligation reaction was used 
to electroporate 40 jxL of E. coli Electromax™ DH5a™ cells, which were then plated on 
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LBK media. Individual colonies were resuspended in about 25 \iL of 10 mM Tris, and 2 
pL of the resuspension was plated on LBK. The remnant resuspension was heated for 10 
minutes at 95°C to break open the bacterial cells, and 2 \iL of the heated cells used in a 25 
pL PCR reaction using the TETXBAF and UBICKPNR primers. The PCR mix contained 
5 the following: IX Taq PCR buffer, 0.2 \iM each primer, 0.2 mM each dNTP, and 1 unit 
of Taq DNA polymerase per reaction. The PCR reaction was performed in a MJ 
Research PTC100 under the following conditions: an initial denaturation at 94°C for 2 
minutes; 32 cycles of 94°C for 30 seconds, 62°C for 1 minute, and 72°C for 2 minutes; 
and a final extension for 7 minutes at 72°C. A large-scale plasmid preparation was done 

10 for a colony that had the desired insert and the tetP/ubiC region was sequenced to confirm 
the DNA sequence of the insert. The resulting plasmid containing the UbiC sequence 
under the control of the tet promotor was designated pMCS2tetP/EcUbiG 

Plasmid DNA (pMCS2tetP/EcUbiC) was electroporated into electrocompetent 
cells of R. sphaeroides strain 35053 and the ATCC 35053/AcrtE strain. Individual 

15 colonies of both strains, along with an E. coli control, were screened by PCR using the 
TETXBAF and UBICKPNR primers to confirm the presence of the insert as described 
above with the addition of 5% DMSO (v/v) to the PCR reaction. 



T)MCS2tetP/Stdxs/Rsdds/EcUbiC 

20 Nucleic acid encoding an S. frueperi polypeptide having DXS activity, nucleic 

acid encoding an R. sphaeroides polypeptide having DDS activity, and nucleic acid 
encoding an E. coli polypeptide having chorismate lyase activity was cloned into the 
pMCS2tetP vector as follows. A vector containing the 5. trueperi dxs gene, the R. 
sphaeroides dds gene, and the E. coli ubiC gene, each behind a tet promoter, was 

25 constructed using the pMCS2tetP/Stdxs/Rsdds construct described above as the starting 
vector. This vector was digested with restriction enzyme Kpnl, cleaned with a QIAquick 
PCR Purification Kit, and digested with the restriction enzyme Nsil. The enzyme 
reaction was inactivated by heating for 20 minutes at 65°C. The digested vector DNA 
was then dephosphorylated with shrimp alkaline phosphatase and gel purified. 

30 A PCR product containing a tet promoter region followed by an E coli ubiC gene 

was amplified using the pMCS2tetP/EcUbiC construct described above as template. The 
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following primers were designed to introduce an Kpnl restriction site at the beginning of 
the amplified fragment and an Nsil site at the end of the amplified fragment. 

TETKPNF 5 ' -TAGGGTACCACCGTCTACGCCGACCT- 
5 CGTTCAAC-3' (SEQ ED NO: 160) 

UBICNSIR 5 '-TGTATGCATGTCGCCACCCACAACGC- 
CCATAATG-3' (SEQ ID NO:161) 

The PCR mix contained the following: IX Native Plus Pfu buffer, 5 ng plasmid 

10 template, 0.2 \M each primer, 0.2 mM each dNTP, 5% DMSO (v/v), and 10 units of 
native Pfu DNA polymerase in a final volume of 200 |iL. The PCR reaction was 
performed in a MJ Research PTC 100 under the following conditions: an initial 
denaturation at 94°C for 2 minutes; 8 cycles of 94°C for 30 seconds, 62°C for 1 minute, 
and 72°C for 2.5 minutes; 24 cycles of 94°C for 30 seconds, 66°C 1 minute, and 72°C for 

1 5 2.5 minutes; and a final extension for 7 minutes at 72°C. The amplification product was 
then separated by gel electrophoresis using a 1% TAE-agarose gel. An 850 bp fragment 
was excised from the gel and purified. The purified fragment was digested with the 
restriction enzyme Nsil, cleaned with a QIAquick PCR Purification Kit, digested with the 
restriction enzyme Kpnl, purified again with a QIAquick PCR Purification Kit, and 

20 quantified on a minigel. 

Fifty ng of the prepared pMCS2tetP/Stdxs/Rsdds vector was ligated with 35 ng of 
the digested tetP/ubiC PCR product using T4 DNA Iigase at 16°C for 16 hours. One \tL 
of ligation reaction was used to electroporate 40 \iL of E. coli Electromax™ DH10B™ 
cells, which were then plated on LBK media. Individual colonies were resuspended in 

25 about 25 \xL of 10 mM Tris, and 2 \iL of the resuspension was plated on LBK. The 

remnant resuspension was heated for 10 minutes at 95°C to break open the bacterial cells, 
and 2 \iL of the heated cells used in a 25 PCR reaction using the SXSCLAF2 and 
UBICNSIR primers. The PCR reaction mix contained IX GC-RICH PCR reaction 
buffer, 1.0 M GC-RICH resolution solution, 0.2 \M each primer, 0.2 mM each dNTP, 

30 and 1 unit of GC-RICH enzyme mix per reaction (Roche). The PCR reaction was 
performed in a MJ Research PTC 100 under the following conditions: an initial 
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denaturation at 94°C for 2 minutes; 8 cycles of 94°C for 30 seconds, 60°C for 1 minute, 
and 72°C for 5 minutes; 24 cycles of 94°C for 30 seconds, 64°C 1 minute, and 72°C for 5 
minutes; and a final extension for 7 minutes at 72°C. A large-scale plasmid preparation 
was done for a colony that had the desired insert, and plasmid DNA was sequenced 
5 through the tetP/ubiC region to confirm the lack of nucleotide errors from PCR. The 
resulting plasmid containing Stdxs sequence under the control of the tet promoter, the 
Rsdds sequence under the control of the tet promotor, and the UbiC sequence under the 
control of the tet promotor was designated pMCS2tetP/Stdxs/Rsdds/EcUbiC. 

Plasmid DNA (pMCS2tetP/Stdxs/Rsdds/EcUbiC) was electroporated into 
10 electrocompetent cells of R. sphaeroides strains 35053 and ATCC 35053/AcrtE. 

Individual colonies of both strains, along with an E. colt control, were screened by PCR 
using the SXSCLAF2 and UBICNSDL primers to confirm the presence of the insert as 
described above. 

15 nMCS2tetP/RsLvtB 

Nucleic acid encoding a LytB R. sphaeroides polypeptide was cloned into the 
pMCS2tetP vector as follows. The R. sphaeroides lytB was identified by TBLASTN 
analysis of its genome using an E. coli lytB sequence as a query. Based on the identified 
sequence the following primers were designed to PCR amplify the gene: 



20 



25 



LYTBHINDF 5 5 -GACGAAGCTTGAAGGAAG AGC ATGCCTCCCCTCA- 

CCCTCTATC-3' (SEQ ID NO: 162) 
LYTBKPNR 5'-GTCACTGAATGAATGGTACCGCAGCCGAGAACCG- 

CCAGAAGCC-3' (SEQIDNO:163) 

The primers introduced a Hindlll restriction site and ribosomal binding site at the 
5' end, and a Kpnl restriction site at the 3' end. The following reaction mix and PCR 
program were used to amplify the lytB gene. 



30 Reaction Mix Program 



Pful OX buffer 10 nL 94°C 2 minutes 

DMSO 5 jxL 7 cycles of: 
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dNTPmix(lOmM) 3 \iL 

LYTBHINDF (100 \xM) 1 jxL 

LYTBKPNR (100 pM) 1 \iL 

Genomic DNA (50 ng/jiL) 2 |iL 



94°C 30 seconds 
59°C 45 seconds 
72°C 3 minutes 



25 cycles of: 



5 Pfu enzyme (2.5 U/jiL) 2 jxL 
DI water 76 p,L 



94°C 30 seconds 
66°C 45 seconds 
72°C 3 minutes 



Total: 



100 nL 



72°C 7 minutes 

4°C Until used further 



10 



The PCR product was run on a 1% TAE-agarose gel, and a fragment about LI Kb 
in size was excised. The excised DNA was isolated using a Qiagen gel isolation kit. The 
isolated DNA was restricted with Hindlll and Kpnl, and was column purified using a 
Qiagen gel isolation kit. Two jxg of pMCS2tetP vector DNA was digested with Hindlll, 

15 and the linear DNA was gel isolated using a Qiagen gel isolation kit. The vector was 
further digested with Kpnl, and the DNA was column purified. The double-digested 
vector was then dephosphorylated with shrimp alkaline phosphatase and column purified 
using a Qiagen gel purification kit. The KpnI/Hindlll-digested R. sphaeroides lytB PCR 
product with the R. sphaeroides dxsl ribosomal binding site described above was ligated 

20 into the prepared vector using T4 DNA ligase for 14-16 hours at 16°C. One \xL of the 
ligation reaction was transformed into E. coli Electromax™ DH10B™ cells, which were 
then plated on LBK (25 ^ig/mL) media Individual colonies were resuspended in about 25 
jjL of DI water, and 2 \iL of the resuspension was plated on LBK. The remnant 
resuspension was heated for 10 minutes at 95 °C to break open the bacterial cells, and 2 

25 \xL of the heated cells was used in a 25 ^L PCR reaction using the LYTBHINDF and 
LYTBKPNR primers. The PCR mix contained the Mowing: IX Taq PCR buffer, 0.2 
\M each primer, 0.2 mM each dNTP, 5% DMSO (v/v), and 1 unit of Taq DNA 
polymerase per reaction. The PCR reaction was performed under the following 
conditions: an initial denaturation at 94°C for 2 minutes; 8 cycles of 94°C for 30 seconds, 

30 59°C for 1 minute, and 72°C for 3 minutes; 24 cycles of 94°C for 30 seconds, 66°C for 1 
minute, and 72°C for 3 minutes; and a final extension for 7 minutes at 72°C. A large 
scale plasmid preparation was done on a culture of a colony containing the lytB PCR 
product, and the tetP/lytB region was sequenced to confirm the lack of nucleotide errors. 
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The resulting plasmid containing the RsLytB sequence under the control of the tet 
promotor was designated pMCS2tetP/RsLytB. 

Plasmid DNA (pMCS2tetP/RsLytB) was electroporated into electrocompetent 
cells of R. sphaeroides strain 35053 and a carotenoid-deficient mutant of 35053 (ATCC 
5 35053/AcrtE). Individual colonies of both strains, along with an E. coli control, were 
screened by PCR using the TETXBAF and LYTBKPNR primers to confirm the presence 
of the insert as described above. 

pMCS2tetP/Stdxs/Rsdds/RsLvtB 

10 Nucleic acid encoding an S. trueperi polypeptide having DXS activity, nucleic 

acid encoding an R. sphaeroides polypeptide having DDS activity, and nucleic acid 
encoding LytB from R. sphaeroides were cloned into the pMCS2tetP vector as follows. 
The R. sphaeroides lytB gene was cloned and expressed along with the R. sphaeroides 
dds and S. trueperi dxs genes. In this triple expression system, each gene was expressed 

15 through its own tetP. The R. sphaeroides lytB gene was PCR amplified along with the 
tetP using the following primers. 



The following PCR mix and program were used to PCR amplify the lytB gene 
along with the tetP. 

25 



TETKPNF 5'-TAGGGTACCACCGTCTACGCCGACCTC- 



GTTGAAC-3' (SEQ ID NO: 164) 
LYTBNSIR 5'-AGGCAATGCATGCAGCCGAGAACCGCC- 
AGAAGCC-3 ' (SEQ ID NO: 1 65) 



Reaction Mix 



Program 



PfulOX buffer 10 nL 

DMSO 5 \iL 

dNTPmix(lOmM) 3 pL 



94°C 2 minutes 
7 cycles of: 



30 TETKPNF (100 pM) 1 pL 

LYTBNSIR (100 jjM) 1 \iL 

pMCS2tetP/lytB (10 ng/pL) 1 \iL 

Pfu enzyme (2.5 U/fiL) 2 \iL 

DI water 77 jxL 



94°C 30 seconds 
63°C 45 seconds 
72°C 3 minutes 



25 cycles of: 

94°C 30 seconds 
69°C 45 seconds 
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72°C 3 minutes 

Total: 100 72°C 7 minutes 

4°C Until used further 

5 In this PCR reaction, pMCS2tetP/RsLytB plasmid DNA was used as a template. 

The PCR product was separated on a 1% TAE-agarose gel, and a fragment about 1 .4 Kb 
in size was excised. The excised DNA was isolated using a Qiagen gel isolation kit. The 
isolated DNA was restricted with Nsil and Kpnl, and was column purified using a Qiagen 
gel isolation kit. Two n-g of pMCS2tetP/Stdxs/Rsdds plasmid DNA was digested with 

10 Nsil, and the linear DNA was gel isolated using a Qiagen gel isolation kit. The vector 
was further digested with Kpnl, and the DNA was column purified. The double-digested 
vector was then dephosphorylated with shrimp alkaline phosphatase and column purified 
using a Qiagen gel purification kit. The KpnI/Nsil-digested PCR product was ligated into 
the prepared plasmid using T4 DNA ligase for 14-16 hours at 16°C. One \iL of the 

1 5 ligation reaction was transformed into E. coli Electromax™ DH10B™ cells, which were 
then plated on LBK (25 ng/mL) media. Individual colonies were resuspended in about 25 
[xL of DI water, and 2 |xL of the resuspension was plated on LBK. The remnant 
resuspension was heated for 10 minutes at 95 °C to break open the bacterial cells, and 2 
\iL of the heated cells was used in a 25 pL PCR reaction using the SXSCLAF2 and 

20 LYTBNSIR primers. The PCR mix contained the following: IX Taq PCR buffer, 0.2 ^iM 
each primer, 0.2 mM each dNTP, 5% DMSO (v/v), and 1 unit of Taq DNA polymerase 
per reaction. The PCR reaction was performed under the following conditions: an initial 
denaturation at 94°C for 2 minutes; 6 cycles of 94°C for 30 seconds, 59°C for 45 sec, and 
72°C for 4 minutes; 25 cycles of 94°C for 30 seconds, 65°C for 45 seconds, and 72°C for 

25 4 minutes; and a final extension for 7 minutes at 72°C. A large scale plasmid preparation 
was done on a culture of a colony containing the correct insert, and the tetP/lytB region 
was sequenced to confirm the lack of nucleotide errors. The resulting plasmid containing 
Stdxs sequence under the control of the tet promoter, the Rsdds sequence under the 
control of the tet promotor, and the LytB sequence under the control of the tet promotor 

30 was designated pMCS2tetP/Stdxs/Rsdds/RsLytB. 

Plasmid DNA (pMCS2tetP/Stdxs/Rsdds/RsLytB) was electroporated into 
electrocompetent cells of R. sphaeroides strain 35053 and a carotenoid-deficient mutant 
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of 35053 (ATCC 35053/AcrtE). Individual colonies of both strains were screened by 
PCR using the SXSCLAF2 and LYTBNSIR primers to confirm the presence of the insert 
as described above. 

5 Example 9 - Making recombinant microorganisms containing knock-outs 

Various nucleic acid sequences within the R. sphaeroides genome were knocked 
out. All restriction enzymes and T4 DNA ligase were obtained from New England 
Biolabs (Beverly, MA) unless otherwise indicated. All plasmid DNA preparations were 
done using QIAprep Spin Miniprep Kits or Qiagen Maxi Prep Kits, and all gel 

10 purifications were done using QIAquick Gel Extraction Kits (Qiagen, Valencia, CA). 

ATCC 35053/AcrtEfkan) 

R. sphaeroides cells lacking crtE were made by inserting a kanamycin resistance 
gene into the crtE sequence as follows. In general, the crtE gene from R. sphaeroides was 

1 5 cloned into a pUCl 9 vector, and a kanamycin gene (kan) was inserted into the gene to 
inactivate it. The crtE-kan insert was amplified by PCR and cloned into pSUP203, a 
mobilizable ColEl -based plasmid that is not maintained in R. sphaeroides unless it is 
integrated into a R. sphaeroides replicon. This plasmid was transformed into E. coli 
strain SI 7-1, a strain that is able to mobilize oriT-containing plasmids in conjugations 

20 with a second bacterial strain. The S 1 7-1 strain was conjugated with R. sphaeroides 
strain 35053, and colonies were identified in which the crtE-kan insert had replaced the 
native crtE gene. 

The crtE gene from R. sphaeroides strain 17023 was amplified by PCR using 
primers designed to introduce an SphI restriction site at the beginning of the amplified 
25 fragment and an Xbal restriction site at the end of the amplified fragment. The sequences 
of the primers were as follows, 

CRTESPHF 5 5 - AAGC ATGCGAAAAAGTTGAC ACCTGTGGAGTC-3 ' (SEQID 
NO:166) 

30 CRTEXBAR 5 ' - ACTCTAGAAGCACCTGCGAATGGACGAAG-3 ' (SEQ EDNO:167) 
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The fragment amplified included the crtE gene along with 85 nucleotides 
upstream of the translational start codon and 228 nucleotides downstream of the 
translational stop codon. The PCR reaction mix contained 0.2 \M each primer, IX GC 
Genomic PCR Buffer (Clontech, Palo Alto, CA), 1 M GC-Melt, l.lmM Mg(OAc) 2 , 0.2 

5 mM each dNTP, IX Advantage-GC Genomic Polymerase Mix, and 1 ng of genomic 
DNA per pL of reaction mix. The PCR was conducted in a Perkin Elmer Geneamp 2400 
and consisted of an initial denaturation at 94°C for 30 seconds; 35 cycles of a 15 second 
denaturation at 94°C, a one minute annealing at 55°C, and a 3 minute extension at 72°C; 
followed by a final extension at 72°C for 5 minutes. Fifty pL of PCR product was 

10 separated on a 1% Tris-Acetate-EDTA (TAE)-agarose gel. A 1 1 80 bp fragment was gel 
purified, and the purified DNA was digested with Xbal and SphI restriction enzymes 
(Promega, Madison, WI). 

pUC19 vector was digested with the restriction enzymes SphI and Xbal, and gel 
purified on a 1% TAE- agarose gel. Fifty ng of purified vector was ligated with about 

15 150 ng of digested crtE PCR product for 16 hours at 14°C using T4 DNA ligase (Roche 
Molecular Biochemicals, Indianapolis, IN). One |xL of ligation reaction was transformed 
into ElectroMAX™ DH10B™ cells (Life Technologies, Gaithersburg, MD), which were 
then plated on LB media containing 100 |ig/rnL ampicillin and 50 pg/mL of 5-Bromo-4- 
Chloro-3-Indolyl-B-D-Galactopyranoside (LBKX). Individual, white colonies were 

20 resuspended in about 20 \iL of 10 mM Tris, and 2 ^L of the resuspension was plated on 
LBKX media. The remnant resuspension was heated for 10 minutes at 95°C to break 
open the bacterial cells, and 2 \iL of the heated cells was used in a 25 pL PCR reaction 
using the CRTESPHF and CRTEXBAR primers. The PCR reaction mix contained 0.2 
\M each primer, IX GC Genomic PCR Buffer, 1 M GCMelt, 1.1 mM Mg(OAc) 2 , 0.2 

25 mM each dNTP, and IX Advantage-GC Genomic Polymerase Mix. The PCR was 
conducted in a Perkin Elmer Geneamp 2400 and consisted of an initial denaturation at 
94°C for 30 seconds; 35 cycles of a 15 second denaturation at 94°C, a one minute 
annealing at 55°C, and a 3 minute extension at 72°C; followed by a final extension at 
72°C for 5 minutes. Plasmid DNA was isolated for colonies having a crtE gene insert and 

30 was digested with the restriction enzyme Hindlll and with a mixture of SphI and Xbal to 
confirm vector structure. 
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One \xg of the pUCl 9crtE construct was digested with Xhol and StuI restriction 
enzymes. These enzymes cut a 273 bp fragment of DNA from the center of the crtE 
gene. The digested DNA was separated on a 1% TAE-agarose gel. A 3.6 Kb fragment 
representing pUCl 9 and the remaining ends of the crtE gene was excised and purified. 
5 The kanamycin resistance gene was amplified by PGR from the PCRU vector 

(Invitrogen, Carlsbad, C A) using primers designed to introduce an StuI restriction site at 
the beginning of the amplified fragment and an Xhol restriction site at the end of the 
amplified fragment. The sequences of the primers were as follows. 

10 KANSTUF 5 '-ATAAAGGCCTTACATGGCGATAGCTAGACTG-3 * (SEQID 
NO:168) 

KANXHOR 5 ' - AAGGCTCGAG AAGG ATCTT ACCGCTGTTGAG-3 ' (SEQID 
NO: 169) 

15 The PCR reaction mix contained 0.2 nM each primer, IX Pfu reaction buffer 

(Stratagene, La Jolla, CA), 0.2 mM each dNTP, 8 units Pfu, and 5 ng of the PCRII vector 
in a 200 \iL reaction. The PCR was conducted in a Perkin Elmer Geneamp 2400 and 
consisted of an initial denaturation at 94°C for 2 minutes; 8 cycles of a 30 second 
denaturation at 94°C, a 1 minute annealing at 55°C, and a 2.5 minute extension at 72°C; 

20 24 cycles of a 30 second denaturation at 94°C, a 1 minute annealing at 55°C, and a 2.5 
minute extension at 72°C; followed by a final extension at 72°C for 5 minutes. The PCR 
product was separated on a 1% TAE- agarose gel, and a 1 .2 Kb fragment was excised and 
purified. One p.g of purified DNA was digested with Xhol and StuI restriction enzymes 
and cleaned using a QIAquick PCR Purification Kit. 

25 Fifty ng of the digested pUC 1 9crtE vector DNA was ligated with 75 ng of the 

digested kan PCR product for 16 hours at 14°C using T4 DNA ligase (Roche). One |jL of 
ligation mix was electroporated into 40 jjL of E. coli ElectroMAX™ DHIOB™ 
electrocompetent cells, which were then plated on LB media containing 100 ng/mL 
ampiciilin and 50 |xg/mL kanamycin (LBAK). Plasmid DNA was isolated from cultures 

30 of individual colonies and was digested in separate reactions with the restriction enzymes 
PstI, SphI, and a Stul/Xbal mixture to confirm correct vector structure. 
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The crtE gene with the inserted kan gene was amplified by PCR using primers 
designed to have Seal restriction sites on both ends of the fragment. The sequences of the 
primers were as follows. 

5 CRTESCAF 5'-ATAGTACTGAAAAAGTTGACACCTGTGGAGTC-3 ' (SEQID 
NO: 170) 

CRTESCAR 5 5 - ATAGT ACT AGC ACCTGCG AATGG ACGAAG-3 ' (SEQ ID NO: 171) 

The PCR reaction mix contained 0.2 \M each primer, IX GC Genomic PCR 

10 Buffer, 1 M GCMelt 5 LI mM Mg(OAc) 2 , 0.2 mM each dNTP, IX Advantage-GC 

Genomic Polymerase Mix, and 1 ng of plasmid DNA per |xL of reaction mix. The PCR 
was conducted in a Perkin Elmer Geneamp 9600 and consisted of an initial denaturation 
at 94°C for 1 minute; 8 cycles of a 30 second denaturation at 94°C, a 1 minute annealing 
at 55°C, and a 4 minute extension at 72°C; 25 cycles of a 30 second denaturation at 94°C, 

15 a 1 minute annealing at 60°C, and a 4 minute extension at 72°C; followed by a final 
extension at 72°C for 5 minutes. 200 |iL of PCR product was separated on a 1% TAE- 
agarosegel. A 2.0 Kb fragment was excised and purified. One jig of purified DNA was 
digested with Seal restriction enzyme, and the digested DNA was purified using a 
QIAquick PCR Purification Kit. 

20 2.3 jxg of pSUP203 plasmid DNA was digested with Seal restriction enzyme. The 

digested DNA was separated on a 1% TAE-agarose gel, and a 7.6 Kb fragment was 
excised and purified. The purified plasmid DNA was then dephosphorylated using calf 
intestinal alkaline phosphatase (Promega). 75 ng of dephosphorylated plasmid DNA was 
ligated with 60 ng and 120 ng of the Seal-digested crtE-kan PCR product for 16 hours at 

25 14°C using T4 DNA ligase (New England BioLabs). One of ligation mix was 

electroporated into 40 \ih of E. coli ElectroMAX™ DH10B™ electrocompetent cells, 
which were then plated on LB media containing 10 fig/mL tetracycline, to which 
pSUP203 carries a resistance gene, and 25 ng/mL kanamycin. Plasmid DNA was isolated 
from cultures of individual colonies and digested with Seal restriction enzyme to check 

30 insert size. 100 ng of plasmid DNA derived from a confirmed colony was electroporated 
into electrocompetent cells of the E. coli strain S17-1. This strain contains a 
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chromosomal copy of the trans-acting elements that mobilize oriT-containing plasmids 
during conjugation with a second bacterial strain. It also carries a gene conferring 
resistance to the antibiotics streptomycin and spectinomycin. The transformation reaction 
was plated on LB media with 10 pg/mL tetracycline, 25 pg/mL kanamycin, and 25 [ig/mL 
5 streptomycin. Individual colonies were resuspended in about 20 \iL of 10 mM Tris and 
heated for 10 minutes at 95°C to break open the bacterial cells. Two \iL of the heated 
cells was used in a 25 PGR reaction using the CRTESCAF and CRTESCAR primers 
to confirm the presence of the crtE-kan insert. The PGR reaction mix contained 0.2 |iM 
each primer, IX GC Genomic PCR Buffer, 1 .0 M GCMelt, 1 . 1 mM Mg(OAc) 2 , 0.2 mM 

10 each dNTP, and IX Advantage-GC Genomic Polymerase Mix. The PCR was conducted 
in a Perkin Elmer Geneamp 9600 and consisted of an initial denaturation at 94°C for 1 
minute; 30 cycles of a 30 second denaturation at 94°C, a 1 minute annealing at 56°C, and 
a 4 minute extension at 72°C; followed by a final extension at 72° C for 5 minutes. 

The pSUP203crtE-kan construct was introduced into if. sphaeroides strain 35053 

15 through conjugation with the E. coli S17-1 strain carrying this vector. The S17-1 donor 
was grown in LB media with 25 pg/mL kanamycin and 25 \xgfmL streptomycin at 37°C 
for 16 hours. A growing culture of R. sphaeroides strain 35053 was used to inoculate 
Sistrom's media using 1/5 to 1/10 dilutions, and the subcultures were grown at 30°C for 
about 20 hours. For both the S17-lcrtE-kan and 35053 genotypes, cells were pelleted 

20 from 1 .5 mL of culture. Pellets were resuspended and pelleted four times in either IX 

Sistrom's salts for the 35053 cells or LB media for the S17-1 cells. The pellets were each 
resuspended in 1.5 mL of LB, and 200 \iL of the S17-1 cells was combined with 1.3 mL 
of the 35053 cells. This mixture was pelleted, the supernatant removed, and the pellet 
resuspended in 20 \\L of LB media. The resuspended cells were spotted onto an LB plate 

25 and incubated at 30°C for 7.5 hours. The cells were then scraped off the plate, 

resuspended in 1.5 mL of IX Sistrom's salts, and plated (200 pL/plate) on Sistrom's 
media supplemented with 25 jag/mL kanamycin and 10 pg/mL of telluride (SisKTell). 
The telluride retards the growth of E. coli cells but is detoxified by R. sphaeroides. After 
7 days, small black colonies were picked off the plates and streaked to fresh plates of the 

30 same media. After 6 days of growth, grayish colonies were patched to LB plates 

containing 25 |ag/mL kanamycin (LBK25) and also to LB plates containing 0.75 jig/mL 
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tetracycline. Desirable double-crossover events, in which the crtE-kan gene was 
integrated and retained in the genome while the vector DNA was lost, exhibited 
kanamycin resistance but lacked tetracycline resistance. Colonies resulting from 
undesirable single-crossover events demonstrated both kanamycin and tetracycline 
5 resistance. 

The mutants were confirmed using PCR and Southern hybridization as follows. 
Colonies that exhibited kanamycin resistance, lacked tetracycline resistance, and had a 
gray phenotype were screened by PCR for the crtE locus using the CRTESCAF and 
CRTESCAR primers as described above. To confirm that they were R. sphaeroides 

1 0 colonies with a truncated crtE gene rather than E. coli colonies carrying the vector, 

colonies were also screened using primers specific to the R. sphaeroides ppsR gene and 
the E. coli dxs gene. Individual colonies were resuspended in about 20 |iL of 10 mM 
Tris, and heated for 10 minutes at 95°C to break open the bacterial cells. Two \xL of the 
heated cells were used per 25 nL PCR reaction. The PCR reaction mix contained 0.2 jxM 

1 5 each primer, IX GC Genomic PCR Buffer, 1 .0 M GCMelt, 1 . 1 mM Mg(OAc) 2 , 0.2 mM 
each dNTP, and IX Advantage-GC Genomic Polymerase Mix. The PCR was conducted 
in a Perkin Elmer Geneamp 9600 and consisted of an initial denaturation at 94°C for 1 
minute; 8 cycles of a 30 second denaturation at 94°C, a 1 minute annealing at 55°C, and a 
3.5 minute extension at 72°C; 22 cycles of a 30 second denaturation at 94°C, a 1 minute 

20 annealing at 61 °C, and a 3.5 minute extension at 72°C; followed by a final extension at 
72°C for 5 minutes. All suspected 35053crtE-kan colonies produced a crtE band the 
same size as the S17-lcrtE-kan control. They all also produced a band of the expected 
size for the ppsR gene and did not produce a band for the E. coli dxs gene. 
To further confirm the presence of double-crossover events. Southern 

25 hybridization was conducted on eight 35053crtE-kan colonies as well as R. sphaeroides 
strains 35053 and 17023. Sequence data for the photosynthetic operon of strain 17023 is 
available in Genbank and was used to determine restriction enzymes likely to have 
hybridization patterns that would distinguish mutants from non-mutants. Genomic DNA 
was isolated from each line using a Gentra Puregene DNA Isolation Kit (Gentra, 

30 Minneapolis, MN). Two |ag of genomic DNA was used in digests with the restriction 

enzymes Apal and Xhol. The digests were separated on a 0.8% TAE agarose gel, and the 
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DNA transferred to a nylon membrane. DIG-labeled molecular weight markers II and III 
(Roche) were also included on the gel/membrane. DIG-labeled probes of the crtE locus 
were synthesized using a PCR DIG Probe Synthesis Kit (Roche). After baking, 
membranes were prehybridized in EasyHyb Buffer (Roche) for at least 2 hours and 

5 hybridized overnight using 400 nL of a 0.5 DIG labeling reaction per mL of hybridization 
solution. Detection was conducted using a Wash and Block Buffer Set (Roche). 
Membranes were washed two times for 5-10 minutes each at room temperature in 2X 
SSC/0.1% SDS and two times for 15-20 minutes each at 68°C in 0.1X SSC/0.1% SDS. 
They were then covered with blocking buffer and placed on a shaker for an hour at room 

1 0 temperature. The blocking buffer was replaced with fresh blocking buffer containing 1 50 
mU of AP conjugate per mL of buffer, and the membranes shaken at room temperature 
for an additional 30 minutes. Membranes were then washed twice for 15 minutes each at 
room temperature with washing buffer, followed by a five minute wash with detection 
buffer. The detection buffer was replaced with fresh detection buffer containing 20 pL of 

1 5 NBT/BCIP solution per mL of buffer. This was placed in the dark at room temperature 
with no shaking until color developed, after which the buffer was replaced with 10 mM 
Tris-1 mM EDTA solution. 

In the Apal digest, the mutant lines exhibited a band of about 850 bp larger than 
the strain 35053 control, which is the size difference expected from the insertion of the 

20 kanamycin gene product in the Stul/Xhol sites. For the Xhol digest, strain 35053 

exhibited a band of about 700 bp, strain 17023 had a band of about 1 100 bp, mutant 7C 
had a band of 1 550 bp, and the remaining mutants had a band of 2050 bp. The reason for 
the size difference in the Xhol bands for the mutants was unclear, but mutant 7C was 
used in further studies due to its possession of the expected band size relative to strain 

25 35053. The resulting R. sphaeroides mutant containing a crtE knockout was designated 
ATCC 35053/AcrtE(kan). 



ATCC 35053/AcrtE 

R. sphaeroides cells lacking crtE were made using sacB selection as follows. A 
30 truncated crtE gene was cloned into the vector pLOl , which is a suicide vector in R. 
sphaeroides. The pLOl vector carries a kanamycin resistance gene, a B. subtilis sacB 
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gene, an oriT sequence, a ColEI replicon, and a multiple cloning site (Lenz et ah, J 
BacterioU 176(14):4385-93 (1994)). The pLOlcrtE plasmid was introduced into R. 
sphaeroides strain 35053 through conjugation with an E. coli donor. The kanamycin 
resistance gene was used to select for single-crossover events between the truncated crtE 

5 gene and the genomic crtE gene that resulted in incorporation of the pLOl crtE DNA into 
the genome. The presence of the sacB gene on the vector allowed for subsequent 
selection for the loss of the vector DNA from the genome, as expression of this gene in 
the presence of sucrose is lethal to E. coli and to R. sphaeroides under certain growth 
conditions. A portion of the double-crossover events that led to loss of the sacB gene 

10 contained the truncated crtE allele. This method of gene knockout is useful because no 
residual antibiotic resistance gene is left in the genome. 

A three-step PCR process was used to create a 249 bp in-frame deletion in the 
crtE gene. The crtE gene from R. sphaeroides strain 35053 was amplified by PCR using 
primers designed to introduce an SphI restriction site at the beginning of the amplified 

1 5 fragment and a Sad restriction site at the end of the amplified fragment. The sequences 
of the primers were as follows. 

CRTESPHF S'-CGTGGCATGCGTGTAAGAAAAAGTTGACA- 
CCTGTGGAGTC-3' (SEQ ID NO: 172) 
20 CRTESACR 5*- CTAAGAGCTCAGTTCGGGCTCGGTCTCGC- 

CTTTCAGGAAG -3 5 (SEQ IDNO:173) 

The PCR reaction mix contained 0.2 \xM each primer, IX Genome Advantage 
reaction buffer, 1 M GCMelt, 1.1 mM Mg(OAc) 2 , 0.2 mM each dNTP, IX Genome 

25 Advantage Polymerase, and 1 ng of genomic DNA per \iL of reaction mix. The PCR was 
conducted in a Perkin Elmer Geneamp 2400 and consisted of an initial denaturation at 
94°C for 2 minutes; 32 cycles of a 30 second denaturation at 94°C, a 45 second annealing 
at 64°C, and a 3 minute extension at 72°C, followed by a final extension at 72°C for 7 
minutes. 200 of PCR product was separated on a 1% TAE-agarose gel, and a 1 .5 Kb 

30 fragment was excised and purified. 
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The second round of PCR consisted of two separate reactions: reaction A, which 
used primers CRTESPHF and CRTERI, and reaction B, which used primers CRTESACR 
and CRTEFI. The sequences of primers CRTEFI and CRTERI were as follows. 



5 CRTEFI S'-GAGAGCGAGAGCCAGATCAAGAAGSGGCTG- 

AAGGACATCC-3 9 (SEQ ID NO: 1 74) 
CRTERI 5 ' -GGATGTCCTTCAGCCSCTTCTTG ATCTGGCT- 
CTCGCTCTC-3' (SEQIDNO:175) 



10 The 20 nucleotides on the 3' ends of this pair of primers are located near the 

center of the crtE gene, 249 bases apart from each other and facing towards the start 
(CRTERI) and end (CRTEFI) of the gene. The 20 bp on the 5' ends of these primers are 
the reverse complement of the 3 5 end of the other primer in the pair. PCR of the two 
separate reactions was conducted as in the first round, with the exception that 0.05 ng of 

15 first round product per \iL of reaction mix was used as template. Also, the thermocycler 
program used a 2 minute initial denaturation at 94°C; eight cycles of a 30 second 
denaturation at 94°C, a 45 second annealing at 56°C, and a 3 minute extension at 72°C, 
followed by eight cycles of a 30 second denaturation at 94°C, a 45 second annealing at 
60°C, and a 3 minute extension at 72°C; followed by 16 cycles of a 30 second 

20 denaturation at 94°C, a 45 second annealing at 64°C, and a 3 minute extension at 72°C; 
followed by a final extension at 72°C for 7 minutes. Both PCR products, about 590 and 
650 bp in length, were separated on a 1% TAE-agarose gel, excised, and gel purified. 

The third round of PCR used the same primers and reaction mixture as the first 
round of PCR with the exception that a mixture of 10 ng of each second round fragment 

25 was used as template rather than genomic DNA (200 jxL reaction). The PCR program 
used was also the same as that used in the first round of PCR with the annealing time 
lengthened to 1 .5 minutes. The 1 .2 Kb third-round product was separated on a 1% TAE- 
agarose gel and purified. Three |o,g of purified DNA was digested with the restriction 
enzymes Sad and Sphl. The digested DNA was cleaned using a QIAquick PCR 

30 Purification Kit and digested with the restriction enzyme Stul. StuI cut within the deleted 
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region and ensured that there was little or no remaining full-length product. The 
digestion mixture was again cleaned using a QIAquick PCR Purification Kit. 

Three \ig of the vector pLOl was digested with the restriction enzymes SphI and 
Sad. The enzymes were inactivated by heating to 65°C for 20 minutes, and the vector 

5 was dephosphorylated using shrimp alkaline phosphatase (Roche). The dephosphorylated 
vector DNA was gel purified on a 1% TAE-agarose gel. 

Sixty-six ng of digested vector DNA was ligated with 80 ng of the digested third- 
round PCR product at 16°C for 16 hours using T4 DNA ligase (Roche). One of 
ligation mix was electroporated into 40 jiL of E. coli ElectroMAX™ DH5a™ 

1 0 electrocompetent cells (Life Technologies), which were then plated on LB media 

containing 50 ng/mL kanamycin (LBK50). Plasmid DNA was isolated from cultures of 
individual colonies and digested with the restriction enzyme Sad and with a mixture of 
SphI and Sad to confirm correct vector structure. 

One \iL of plasmid DNA was used to transform electrocompetent cells of the 

15 previously described^ coli strain S17-1. The electroporated cells were plated on LB 
media containing 25 |ag/mL of kanamycin, 25 |ag/mL of streptomycin, and 25 jxg/mL of 
spectinomycin (LBKSMST). Single colonies were used to start cultures for plasmid 
DNA isolation and used in conjugation. These colonies were also plated on LB media 
containing 5% sucrose and 25 ^g/mL of kanamycin to ensure that the sacB gene was still 

20 functional. Only colonies which exhibited lethality on the sucrose media were used in 
conjugation. The presence of the correct insert size was confirmed by digestion of 
plasmid DNA with the restriction enzymes Sad and SphI. 

Growing cultures of iJ. sphaeroides strain 35053 were sub-cultured, using 1/5 and 
1/10 volumes of inoculum, in 5 mL Sistrom's media supplemented with 20% LB and 

25 grown at 30°C for 12 hours. The SI 7-1 donor colonies were grown in LBKSMST media 
at 37°C for 12 hours. 1.5- 3.0 mL of each culture was pelleted, and the pellets were 
washed four times with LB media. Relative pellet size was estimated and about 2 
volumes of 35053 cells were used to 1 volume of S17-1 cells. The cell mixture was 
pelleted, resuspended in 20 pL of LB media, spotted on an LB plate, and incubated at 

30 30°C for 7- 1 5 hours. The cells were then scraped off the surface of the plate and 
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resuspended in 1.5 mL of Sistrom's salts. 200 |LtL of resuspended cells were plated on 
each of seven plates of SisKTell media. 

Colonies that grew on the plates after about 10 days, representing proposed single- 
crossover events, were streaked to new plates of the same media. Upon growth, single 
5 colonies were streaked out on LBK25 media. Purified colonies were patched to Sistrom's 
media supplemented with IX LB, 15% sucrose, 0.5% DMSO (v/v), and 25 |ag/mL 
kanamycin (SisLBK15%SucDMSO). These were grown in an anaerobic chamber 
(Becton Dickinson, Sparks, MD) at 30°C for 5 days to check for lethality of the sacB 
gene in the proposed single-crossover events. Concurrently, the cultures were patched to 

10 SisLB media containing 15% sucrose and 0.5% DMSO (v/v) without kanamycin 

(SisLB15%SucDMSO). Several of the cultures exhibited both white and red colonies 
upon growth on this media. Whitish-gray colonies were purified from these cultures and 
tested by PCR to show that they contained the truncated crtE allele. These colonies were 
also screened using primers specific to the R. sphaeroides ppsR gene and the E. coli dxs 

15 gene as described above. Potential double crossovers were also streaked on LBK25 
plates to confirm that they were now sensitive to kanamycin. The resulting R. 
sphaeroides mutant containing a crtE knockout was designated ATCC 35053/AcrtE. 

Several discoveries were made using the sacB method to knockout nucleic acid 
sequenced within the R. sphaeroides genome. First, it was discovered that the cultures 

20 used in conjugations, particularly those of the recipient R. sphaeroides strain, should be in 
exponential growth. Second, it was discovered that when using the SI 7-1 strain as a 
vector donor, the use of telluride in the plating medium is unnecessary as this strain is a 
proline auxotroph and will not grow on Sistrom's media without LB supplementation. 
Third, it was discovered that potential single crossovers should be screened using two 

25 separate PCR reactions. The first reaction should use a primer within the gene of interest 
together with a primer homologous to upstream sequence. The second reaction should 
use a primer within the gene of interest together with a primer homologous to 
downstream sequence. One of these two reactions should produce a truncated fragment. 
Fourth, it was discovered that single crossovers that have been confirmed to have sacB 

30 lethality can be grown aerobically in Sistrom's media for 2 days and then plated on' 

SisLB 15%SucDMSO media. The volume plated varies depending on the rate of growth 
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of the strain, but is about one jiL or less for strain 35053. This is then grown 
anaerobically for about 5 days. Fifth, it was discovered that the sacB gene may not 
completely kill cells with the gene, so there may be a background level of very small 
colonies. The desired double-crossover colonies, however, are typically larger. These 
5 colonies should be purified and screened by PCR to identify whether they contain the 
truncated or full-length allele. Sixth, it was discovered that using one primer homologous 
to sequence upstream of the knockout gene and one primer homologous to sequence 
downstream of the gene is useful in confirming the correct location of the insertion event 
in addition to determining the allele that is present. 

10 

ATCC 35053/AppsRfstrep) 

R. sphaeroides cells lacking PPSR were made by inserting a 
spectinomycin/streptomycin resistance gene into the ppsR sequence as follows. To PCR 
amplify the ppsR gene from R. sphaeroides strain 17023, the following primers were 
1 5 designed based on published sequence (GenBank Accession Number L19596). 

PPSRF2 5 '-AGTCAGTACTAACTGGTGAAGACGCTGAAG-3 ' (SEQ IDNO:176) 
PPSRR2 5'-GATCAGTACTGTGAACGAATACGATACGCA-3 5 (SEQ IDNO:177) 

20 Each primer contained a Seal restriction site. The ppsR gene was amplified using 

following reaction mix and PCR amplification program. 



Reaction Mix 



25 



30 



pfulOX buffer lOfiL 

DMSO 5 nL 

dNTPmix(lOmM) 8 jiL 

PPSRF2 (50 \iM) 2 \iL 

PPSRR2 (50 jiM) 2 ^iL 
Genomic DNA (50 ng/^L) 2 \iL 

pfu enzyme (2.5 U/pL) 2 \iL 

DI water 69 \iL 

Total: 100 ^iL 



Program 



72°C 



94°C 5 minutes 
8 cycles of: 

94°C 

54°C 

72°C 
25 cycles of: 

94°C 

61°C 

72°C 
10 minutes 

4°C Until used further 



45 seconds 
45 seconds 
3 minutes 

45 seconds 
45 seconds 
3 minutes 



35 
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The PCR product was separated on a 0.8% TAE agarose gel, and a band of about 
1.8 Kb was cut and gel isolated using Qiagen Gel Isolation kit (Qiagen, Valencia, CA). 
The gel isolated DNA was digested with Seal (New England BioLabs, Beverly, MA) for 
5 hours. The digested DNA was column purified using Qiagen Gel Isolation kit The cut 
5 DNA was ligated into vector pSUP203 that was also digested with Seal enzyme. 

2.3 \ig of pSUP203 plasmid DNA was digested for 4 hours at 37°C with Seal 
restriction enzyme. The digested DNA was separated on a 1% TAE agarose gel. A 7.6 
Kb fragment was excised and purified. The purified plasmid DNA was then 
dephosphorylated using calf intestinal phosphatase (New England Biolabs). 1 00 ng of 

1 0 dephosphorylated plasmid DNA was ligated with 200 ng of the Seal-digested PpsR DNA 
for 16 hours at 14°C using T4 DNA Iigase (New England BioLabs). One |xL of ligation 
mix was electroporated into 40 \iL ofE. coli ElectroMAX™ DH5a™ (Life Technologies, 
Gaithersburg, MD) electrocompetent cells, which were then recovered in 1 mL of SOC 
media for one hour at 37°C and plated on LB media containing 15 jag/mL tetracycline. 

1 5 Plasmid DNA was isolated from 8 individual colonies using Qiagen spin Mini prep kit 
and digested with Seal restriction enzyme to check insert size. Four of the colonies had a 
correct insert. 1.5 jag of the plasmid DNA obtained from confirmed colony was digested 
with Xhol restriction enzyme (New England BioLabs, Beverly, MA). This enzyme has a 
single restriction site in the open reading frame of ppsR gene. A linear DNA band of 

20 about 8.4 Kb was gel isolated using a Qiagen Gel isolation kit. A 

spectinomycin/streptomycin resistance omega cassette was obtained by digesting plasmid 
pUIl 63 8 (Obtained from Dr. Samuel Kaplan's laboratory) with Xhol enzyme. The digest 
was separated on a 0.8% TAE agarose gel, and a DNA band of about 2.1 Kb was gel 
isolated. This DNA which encoded for spectinomycin/streptomycin resistance gene was 

25 ligated to pSUP203/PpsR, which was also restricted with Xhol enzyme. One \iL of 
ligation mix was electroporated into 40 fiL of E. coli ElectroMAX™ DH5a™ (Life 
Technologies, Gaithersburg, MD) electrocompetent cells, which were then recovered in 1 
mL of SOC media for one hour at 37°C and plated on LB media with 15 ^g/mL 
tetracycline, 25 \ig/mL spectionomycin, and 25 ng/mL streptomycin. Plasmid DNA was 

30 isolated from 10 individual colonies using Qiagen spin Mini prep kit and digested 
separately with Seal and Xhol restriction enzyme to check insert size. Five of the 
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colonies had a correct insert. 100 ng of plasmid DNA from a confirmed colony was 
electroporated into electrocompetent cells of the E. coli strain SM10. This strain contains 
a chromosomal copy of the trans-acting elements that mobilize oriT-containing plasmids 
during conjugation with a second bacterial strain. It also carries a gene conferring 
5 resistance to the antibiotic kanamycin. The transformation reaction was recovered in 1 
mL of SOC media for one hour and plated on LB media with 10 jig/mL tetracycline, 25 
jig/mL kanamycin, 25 |ag/mL of streptomycin, and 25 (jg/mL spectinomycin. 

The pSUP203/ppsR-SM-ST construct was conjugated from the K coli SM10 host 
into R. sphaeroides strain 35053. The SM10 donor was grown in LB media with 25 

10 pg/mL kanamycin, 25 |ag/mL streptomycin, and 25 p.g/mL spectinomycin at 37°C for 16 
hours. A growing culture of R. sphaeroides strain 35053 was used to inoculate Sistrom's 
media in 1/5 to 1/10 dilutions. These cultures were grown for about 20 hours. Cells 
were pelleted for 1.5 mL of culture of both the SM10 pSUP203/PpsR-SM-ST and 35053 
genotypes. Pellets were washed four times in Sistrom's media without vitamins and 

1 5 glucose. The pellets were each resuspended in 1 .5 mL of Sistrom's media without 

vitamins and glucose. 200 \iL of the SM10 pSUP203/PpsR-SM-ST cells were combined 
with 1.3 mL of the 35053 cells. This mixture was pelleted, the supernatant was removed, 
and the pellet was resuspended in 20 \iL of LB media. The resuspended cells were 
spotted onto a LB plate that was then incubated at 30°C for 7 hours. The cells were then 

20 scrapped off the LB plate, resuspended in 1 .5 mL of IX Sistrom's media without vitamins 
and glucose, and plated (200 pL/plate) on Sistrom's media supplemented with 25 jag/mL 
spectinomycin, 25 p,g/mL streptomycin, and 10 ng/mL of telluride. The telluride retards 
the growth of E. coli cells but is detoxified by R. sphaeroides. After 7-10 days, small 
black colonies were picked off the plates and streaked to fresh plates of the same media. 

25 After 6 days of growth, colonies were patched to LB plates containing 25 pg/mL 
spectinomycin and 25 ng/mL streptomycin (LBSMST25), and also to LB plates 
containing 0.75 jig/mL tetracycline. Desirable double-crossover events, in which the 
PpsR-SM-ST gene is retained in the genome and the vector DNA is lost, would have 
spectmomycin/streptomycin resistance but lack tetracycline resistance. Colonies 

30 resulting from undesirable single-crossover events would demonstrate resistance to all of 
these antibiotic markers. 
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Colonies that exhibited only spectinomycin/streptomycin resistance and displayed 
deep red color were confirmed for double-crossover by Southern hybridization. Southern 
hybridization was conducted on nineteen potential 35053/PpsR-SM-ST colonies in 
addition to 35053 and R sphaeroides strain 17023. Sequence data for the photosynthetic 
5 operon of 1 7023 is available in Genbank and was used to determine restriction enzymes 
likely to have hybridization patterns that would distinguish mutants from non-mutants. 
Genomic DNA was isolated from each line using a Gentra Puregene DNA Isolation Kit 
(Gentra, Minneapolis, MN). 2 |Ltg of genomic DNA was used in digests using the 
restriction enzymes Ncol, Apal, and Xmal in separate reactions. The digests were 

10 separated on a 1% TAE agarose gel, and the DNA was transferred to nylon membrane 
(Roche Molecular Biochemicals, Indianapolis, IN). DIG-labeled molecular weight 
markers II and III (Roche) were also included on the gel/membrane. DIG-labeled probes 
of the PpsR locus were made using a PGR DIG Probe Synthesis Kit (Roche). After 
baking, membranes were prehybridized in EasyHyb Buffer (Roche) for at least 2 hours 

15 and hybridized overnight using 400 nL of a 0.5 DIG labeling per mL of hybridization 
solution. Detection was done using a Roche Wash and Block Buffer Set (Roche). 
Membranes were washed two times for 5-10 minutes at room temperature in 2X 
SSC/0.1% SDS and two times for 15-20 minutes at 68°C in 0.1X SSC/0.1% SDS. They 
were then covered with blocking buffer and placed on a shaker for an hour at room 

20 temperature. The blocking buffer was replaced with fresh blocking buffer containing 1 50 
mU of AP conjugate per mL of buffer, and the membranes shaken at room temperature 
for an additional 30 minutes. Membranes were then washed twice for 15 minutes at room 
temperature with washing buffer, followed by a five minutes wash with detection buffer. 
The detection buffer was replaced with fresh detection buffer containing 20 |xL of 

25 NBT/BCIP solution per mL of buffer. This was placed in the dark at room temperature 
with no shaking until sufficient color was developed. 

In the Ncol digest, the lanes of colony 9 and 10 exhibited a band about 2 Kb larger 
than the 35053 control, which is the size difference expected from the insertion of the 
spectinomycin/streptomycin resistance cassette into the Xhol site For the Xmal digest, 

30 35053 exhibited a single band about 5.5 Kb, while colonies 9, 10, and 5 exhibited two 
bands whose summed size was about 2 Kb higher than that of 35053. Two bands were 
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observed in colony 9, 10, and 5 because a Xmal was introduced along with the 
spectinomycin/streptomycin resistance cassette. For Apal digest, the control 35053 
sample exhibited two bands since ppsR gene harbors an Apal site. Each of these bands 
was about 2.3 Kb in size. Colony 9, 10, and 5 exhibited three bands, whose summed size 
5 was about 2 Kb higher band that of 35053. An extra band was observed in colonies 9, 10, 
and 5 because an Apal site was introduced along with the spectinomycin/streptomycin 
resistance cassette. 

The resulting R. sphaeroides mutant containing the ppsR knockout was designated 
ATCC 35053/AppsR(strep). 

10 

ATCC 35053/AppsR 

R. sphaeroides cells lacking ppsR were made using sacB selection as follows. A 
three-step PGR process was used to create a 255 bp in-frame deletion in the PpsR gene, 
so that there would be no residual antibiotic resistance gene in the genome. The PpsR 
15 gene from R. sphaeroides strain 35053 was amplified by PCR using primers designed to 
introduce an Sad restriction site at the beginning of the amplified fragment and a SphI 
restriction site at the end of the amplified fragment. The sequences of the primers were as 
follows. - 



20 PPSRSACF2 5 ' -GTC AAATG AGCTCC AAACTGGTGAAGA- 

CGCTGAAGGACAT-3* (SEQ IDNO:178) 
PPSRSPHR S'-CAGTCGGGCATGCGTCCATTTCAGTTGAC- 
ATACTTCTGTG-3 ' (SEQ ID NO: 179) 



25 The following PCR mix program was used to amplify the PpsR gene. 



Reaction Mix 



pfulOX buffer 10 nL 

DMSO 5 jiL 

30 dNTP mix(lOmM) 3\iL 

PPSRSACF2 (100 jiM) 1 pL 

PPSRSPHR (100 nM) l\iL 

Genomic DNA (50 ng/pL) 2\iL 



Program 



94°C 2 minutes 
8 cycles of: 

94°C 

58°C 

72°C 
25 cycles of: 



30 seconds 
45 seconds 
3 minutes 
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pfu enzyme (2.5 U/jiL) 2\xL 94°C 30 seconds 

DI water 76 |llL 64°C 45 seconds 

72°C 3 minutes 

Total: 100 pL 72°C 7 minutes 

5 4°C Until used further 

100 nL of PCR product was separated on a 1% TAE agarose gel, and a fragment 
about 1 .8 Kb was excised and purified using Qiagen Gel isolation kit. 

The second round of PCR consisted of two separate reactions: reaction A, which 
1 0 used primers PPSRS ACF2 and PPSRMIDR, and reaction B, which used primers 

PPSRSPHR and PPSRMIDF. The sequences of primers PPSRMDDF and PPSRMIDR 
were as follows. 



PPSRMIDF 5'-CTCTTGCTCGGCGGCGTGCGGCTCTATCA- 
15 CGAGGGGGTGGA-3 ' (SEQ IDNO:180) 

PPSRMIDR S'-TCCACCCCCTCGTGATAGAGCCGCACGCC- 
GCCGAGCAAGAG-3 ' (SEQ IDNO:181) 



The 20 nucleotides on the 3' ends of this pair of primers are located near the 
20 center of the ppsR gene, 255 bases apart from each other, and facing towards the start 
(PPSRMIDR) and end (PPSRMIDF) of the gene. The 20 bp on the 5' ends of these 
primers are the reverse complement of the 3* end of the other primer in the pair. The 
following reaction mix and program were used to conduct these PCR. 



25 


Reaction Mix A 




Program 


pfu 1 OX buffer 


10 uL 


94°C 2 minutes 




DMSO 


5 uL 


8 cycles of: 




dNTP mix (10 mM) 


3 uL 


94°C 30 seconds 




PPSRSACF2 (100 uM) 


1 uL 


58°C 45 seconds 


30 


PPSRMIDR (100 uM) 


1 uL 


72°C 3 minutes 




DN A from first round 


1 uL 


25 cycles of: 




(10 ng/uL) 


94°C 30 seconds 




pfu enzyme (2.5 U/uL) 


2 uL 


64°C 45 seconds 
72°C 3 minutes 


35 


DI water 


77 uL 


72°C 7 minutes 




Total: 


100 uL 


4°C Until further use 
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Reaction Mix B 




Program 


pfu 1 OX buffer 


10 uL 


94°C 2 minutes 


DMSO 


5^L 


8 cycles of: 


dNTPmix(lOmM) 


2uL 


94°C 30 seconds 


PPSRSPHR(lOOuM) 


1 uL 


58°C 45 seconds 


PPSRMIDF (100 uM) 


1 uL 


72°C 3 minutes 


DNA from first round 


1 uL 


25 cycles of: 


(5ng/uL) 




94°C 30 seconds 


pfu enzyme (2.5 U/uL) 


2uL 


64°C 45 seconds 


DI water 


78 uL 


72°C 3 minutes 






72°C 7 minutes 


Total: 


100 uL 


4°C Until further use 



1 5 Both PCR products, about 800-700 bp in length, were separated on a 1 % TAE 

agarose gel, excised, and gel purified using a Qiagen gel isolation kit. 

The third round of PCR used primers PPSRSACF2 and PPSRSPHR but used both 
fragments derived in the second round of PCR as template. The PCR mixture used was 
the same as in the first round of PCR except that equal molar amounts of the round 2 

20 fragments were used as template. The PCR program used was also the same as that used 
in the first round of PCR, with the annealing time lengthened to 1 .5 minutes. The 1 .5 Kb 
third-round product was separated on a 1% TAE agarose gel and purified using Qiagen 
gel isolation kit. The purified DNA was digested overnight at 37°C with the restriction 
enzymes Sad and Sphl. 

25 Three jig of the vector pLOl was digested with the restriction enzymes Sphl and 

Sad at 37°C for 16 hours. The enzymes were inactivated by heating to 65°C for 20 
minutes. Dephosphorylation of the vector was achieved by adding 4.7 \iL of shrimp 
alkaline phosphatase 10X buffer (Roche) and 2 joL of shrimp alkaline phosphatase to the 
inactivated digest. This mixture was heated at 37°C for 10 minutes and then 65°C for 15 

30 minutes. The dephosphorylated vector DNA was then gel purified on a 1 .0% TAE 
agarose gel. 

98 ng of vector DNA was ligated with 21 0 ng of the digested third round PCR at 
14°C for 14 hours using T4 DNA ligase (Roche). One \iL of ligation mix was 
electroporated into 40 |iL of E. coli ElectroMAX™ DH5a™ electrocompetent cells (Life 
35 Technologies), which were then recovered in 1 mL of SOC media for one hour and plated 
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on LB media with 25 p,g/mL kanamycin (LBK25). Plasmid DNA was isolated from 
eight individual colonies. Plasmid DNA was checked for correct insert with a PGR 
screen using the PCR protocol from first round. 

One jxL of plasmid DNA was used to transform electrocompetent cells of E. coli 
5 strain S 1 7- 1 . The electroporated cells were recovered in 1 mL of SOC media for one 
hour and plated on LB media with 25 |xg/mL of kanamycin, 25 n-g/mL of streptomycin, 
and 25 jxg/mL of spectinomycin (LBKSMST). Single colonies were used to start cultures 
for plasmid DNA isolation and used in conjugation. These colonies were also plated on 
LB media containing 5% or 15% sucrose, and 25 \ig/mL of kanamycin to ensure that the 
10 sacB gene was still functional. Only colonies that showed lethality on the sucrose media 
were used in conjugation. The presence of the correct insert size was confirmed by 
colony PCR. 

Growing cultures of R. sphaeroides strain 35053 were subcultured, using 1/4 and 
1/8 volumes of inoculum, in 5 mL Sistrom's media supplemented with 20% LB and 

1 5 grown at 30°C for 9 hours. The S 17-1 donor colonies were grown in LBKSMST media 
at 37°C for 16 hours. 3.0 mL of 35053 and 0.5 mL of SI 7-1 donor cells were centrifuged 
and washed four times in Sistrom's media without glucose. Each cell pellet was 
resuspended into 20 pL LB, and the S17-1 donor suspension was mixed with 35053. The 
mixture was then spotted on LB, which was incubated at 30°C for 14-16 hours. The cells 

20 were then scraped off the surface of the plate and resuspended in 1.5 mL of Sistrom's 
salts. 200 \iL of resuspended cells were plated on each of the seven Sistrom's media 
plates that were supplemented with 25 ng/mL of kanamycin. 

Colonies that grew on the plates after about 10-14 days, representing proposed 
single crossover events, were streaked to new plates of the same media. Upon growth, 

25 single colonies were transferred to LBK25 media. These cultures were grown for 36 to 
48 hours in Sistrom's media supplemented with 20% LB and no kanamycin at 30°C. 0.1 
\iL and 5 |liL of this culture was plated on LB media that was supplemented with 
Sistrom's salts and 15% sucrose. The plates were placed in an anaerobic chamber 
(Becton Dickinson, Sparks, MD), and the chamber was placed in a 30°C incubator. After 

30 4-5 days, several colonies showed up on the plates, indicating the occurrence of double- 
crossover events. Four colonies from each single-crossover strain were purified by 
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streaking on LB agar plates. Single colonies of double-crossover strains were screen by 
PCR for integration of truncated version of the ppsR gene into the chromosome. For 
screening, the following primers were used, which were located upstream and 
downstream of the PpsR gene. The use of upstream and downstream primer confirms 
5 both the locus of integration as well as truncation of PpsR gene. 

PPSRUPF 5 '-GAGCAGCACACTCTGGGAGC-3 5 (SEQIDNO:182) 
PPSRDNR 5*-CCACACAGGTAGGACACCCAC-3' (SEQ ID NO: 183) 



1 0 The following reaction mix and PCR program was used. 



Reaction Mix 




Proeram 


TaqMg+ 10X buffer 


2.5 uL 


94°C 2 minutes 


DMSO 


1.25 uL 


29 cycles of: 


15 dNTP mix (10 mM) 


0.5 uL 


94°C 30 seconds 


PPSRUPF (100 uM) 


0.125 uL 


61°C 45 seconds 


PPSRDNR (100 uM) 


0.125 uL 


72°C 3 minutes 


Cell boil mix 


2 u,L 


72°C 7 minutes 


Taq enzyme (5 U/uL) 


0.2 uL 


4°C Until further use 


20 DI water 


18.3 uL 




Total: 


25 uL 





The cell boil mix was prepared by resuspending a single colony in 20-25 \iL of 
water. The suspension was heated at 95°C for 10 minutes in a PCR machine. The tube 
25 was given a quick spin to pellet the solids. 

The colonies that exhibited the truncated version of the PpsR gene were further 
tested for kanamycin sensitivity by streaking them on LB plates that were supplemented 
with 25 jig/mL of kanamycin. Also, these colonies were PCR screened for the kanamycin 
resistance gene. 

30 The resulting R. sphaeroides mutant containing the ppsR knockout was designated 

ATCC 35053/AppsR. 
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ATCC35053/AccoN 

R, sphaeroides cells lacking ccoN were made using sacB selection as follows. A 

mutant of R. sphaeroides strain 2.4.1 having a 546 bp deletion in the ccoN gene (& 

sphaeroides 2.4. 1/AccoN) was obtained from the laboratory of Samuel Kaplan at the 
5 University of Texas (Oh and Kaplan, Biochemistry, 38:2688-2696 (1999)). The mutated 

ccoN locus of this strain was amplified by PCR and cloned into pLOl . This plasmid was 

transformed into E. coli strain SI 7-1. The SI 7-1 strain was conjugated with 75!. 

sphaeroides strain 35053, and colonies were identified in which the truncated locus had 

replaced the native ccoN gene. 
10 The truncated ccoN gene from R. sphaeroides 2.4.1/AccoN was amplified by PCR 

using primers designed to introduce a SacI restriction site at the beginning of the 

amplified fragment and a SphI restriction site at the end of the amplified fragment. The 

sequences of the primers were as follows. 



15 CCONSACF 5 5 -TC AG AGCTCGTGTGATCG AATGGGGCTTT- 

GTTCCTTGATG-3 ' (SEQ IDNO:184) 
CCONSPHR S'-GAAGCATGCAGGTGATCGACGTGCCACTC- 
GTCCGAATAG-3 ' (SEQ ID NO: 1 85) 



20 The PCR reaction mix contained 0.2 pM each primer, IX Native Pfu reaction 

buffer, 0.2 mM each dNTP, 5% DMSO, and 10 units of Pfix DNA polymerase in a 200 \\L 
reaction. Three \iL of the glycerol stock was diluted in 20 \iL of 10 mM Tris and heated 
at 94°C for 1 0 minutes, after which 4 |jL was added to the PCR reaction. The PCR was 
conducted in a MJ Research PT100 and consisted of an initial denaturation at 94°C for 2 

25 minutes; 32 cycles of a 30 second denaturation at 94°C, a 1 minute annealing at 66°C 5 
and a 4 minute extension at 72°C, followed by a final extension at 72°C for 7 minutes. 
The PCR product was separated on a 1% TAE-agarose gel, and a 1.6 Kb fragment was 
excised and purified. Three p,g of purified PCR product was digested with SacI 
restriction enzyme and separated on a 1% TAE gel. A 1.4 Kb band was excised and 

30 purified. A SacI restriction site exists about 200 bp from the CCONSPHR end of the 
original PCR product. 



124. 



WO 02/26933 



PCT/US01/30328 



Three jig of the vector pLOl was digested with the restriction enzyme SacL The 
enzyme was inactivated by heating to 65°C for 20 minutes, and the digested vector was 
dephosphorylated using shrimp alkaline phosphatase. The dephosphorylated vector DNA 
was gel purified on a 1% TAE-agarose gel. 
5 50 ng of digested vector DNA was ligated with 65 ng of the digested ccoN PCR 

product at 16°C for 16 hours using T4 DNA ligase (Roche). One \xh of ligation mix was 
electroporated into 40 |xL of K coli Electromax™ DH5a™ electrocompetent cells, which 
were then plated on LBK media. Plasmid DNA was isolated from cultures of individual 
colonies and digested with the restriction enzyme Sad to confirm correct insert size. 

10 The E. coli strain SI 7-1 contains a chromosomal copy of the trans-acting elements 

that mobilize oriT-containing plasmids during conjugation with a second bacterial strain. 
It also carries genes conferring resistance to the antibiotics streptomycin and 
spectinomycin. In addition, SI 7-1 is a proline auxotroph and will not grow on 
unsupplemented Sistrom's media. One pL of DNA of the truncated ccoN construct was 

15 used to transform electrocompetent cells of E. coli strain S17-1. The electroporation was 
plated on LBKSMST. Single colonies were used to start cultures for plasmid DNA 
isolation and used in conjugation. These colonies were also plated on LB media 
containing 5% sucrose and 25 |Ltg/mL of kanamycin to ensure that the sacB gene was still 
functional. Only colonies that exhibited lethality on the sucrose media were used in 

20 conjugation. The presence of the correct insert size was confirmed by digestion of 
plasmid DNA with the restriction enzyme SacL 

Growing cultures of R. sphaeroides strain 35053 were subcultured in Sistrom's 
media supplemented with 20% LB to ensure that they were in exponential growth. The 
SI 7-1 donor colonies were grown in LBKSMST media at 37°C overnight or subcultured 

25 from growing colonies. 2-4 mL of each culture was centrifuged, and the pellets were 
washed four times in LB media. Relative pellet size was estimated, and about 2 volumes 
of 35053 cells were used to 1 volume of SI 7-1 cells. The cell mixture was then pelleted, 
resuspended in 20 nL of LB media, and spotted on an LB plate. This plate was incubated 
at 30°C for 7-15 hours. The cells were then scraped off the surface of the plate and 

30 resuspended in L2 mL of Sistrom's salts. 200 ^iL of resuspended cells were plated on 
each of six plates of Sistrom's media containing 25 ng/mL of kanamycin (SisK). 
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Colonies that grew on the plates after about 10 days, representing potential single- 
crossover events, were streaked to new plates of SisK media. Upon growth, single 
colonies were transferred to LBK media. Purified colonies were streaked to Sistrom's 
media supplemented with IX LB, 15% sucrose, 0.5% DMSO (v/v), and 25 |ig/mL 
5 kanamycin (SisLBK15%SucDMSO). These were grown in an anaerobic chamber 
(Becton Dickinson, Sparks, MD) at 30°C for 5 days to check for lethality of the sacB 
gene in the single-crossover events. The purified colonies were also screened in two 
separate PCR reactions. The first reaction used a primer within the gene of interest 
(CCONR) together with a primer homologous to upstream sequence (CCONUPF2), and 
1 0 the second reaction used a primer within the gene of interest (CCONS ACF) together with 
a primer homologous to downstream sequence (CCONDNR2). Single-crossover events 
exhibited a truncated fragment in one of the two reactions, depending on whether the 
crossover occurred upstream or downstream of the deletion. The primer sequences were 
as follows. 

15 

CCONUPF2 5'-CTCACAACCTCCAACCGATG-3' (SEQ ID NO: 186) 
CCONR 5 '-CGATGGTGACCACGAAGAAG-3 ' (SEQ ID NO:94) 
CCONDNR2 5'-CGTAACGCTCGGTCTCGTC-3' (SEQ IDNO:129) 

20 Single-crossover colonies were grown in Sistrom's media supplemented with 20% 

LB. After 2 days of growth, 0.1-1 \xL of the cultures was plated on Sistrom's media 
supplemented with IX LB, 0.5% DMSO (v/v), and 15% sucrose (SisLB15%SucDMSO). 
These cultures were grown anaerobically for about 5 days. The sacB gene did not always 
completely kill cells with the gene, so there was often a background level of very small 

25 colonies. The larger colonies, which represented double-crossover events, were purified 
on LB media and screened by PCR to identify whether they contained the truncated or 
foil-length allele. The CCONUPF2 and CCONDNR2 primers were used in this PCR 
screen to ensure that the truncated gene also was inserted in the correct location in the 
genome. Potential double-crossovers were also streaked on LBK plates to confirm that 

30 they were now sensitive to kanamycin. 
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The resulting R. sphaeroides mutant containing the ccoN knockout was 
designated ATCC 35053/AccoN. 

ATCC 35053/AcrtE/AccoN 

5 R. sphaeroides cells lacking crtE and ccoN were made as follows. The wildtype 

ccoN allele of a crtE knockout mutant (ATCC 35053/AcrtE) was replaced with a 
truncated ccoN allele as described above. Double-crossover colonies having the 
truncated ccoN allele were then re-screened by PCR for the crtE and ccoN loci. These 
colonies were plated on LBK25 and screened by PCR to confirm the loss of the vector 

10 from the genome. The resulting R. sphaeroides mutant containing the crtE knockout and 
ccoN knockout was designated ATCC 35053/AcrtE/AccoN. 

ATCC 35053/AcrtE/AppsR/AccoN 

R. sphaeroides cells lacking crtE, ppsR, and ccoN were made as follows. The 

15 wildtype ppsR allele of a crtE/ccoN knockout mutant (ATCC 35053/AcrtE/AccoN) was 
replaced with a truncated ppsR allele as described above with the following exceptions. 
After conjugation on an LB plate, the conjugated cells were plated on Sistrom's media 
containing 25 ng/mL of kanamycin and 0.5% DMSO (SisKDMSO) rather than on SisK. 
After purification on SisKDMSO and LBKDMSO, single-crossovers were grown 

20 aerobically in Sistrom's media supplemented with IX LB and 0.5% DMSO. After 2 days 
of growth, the cultures were plated on Sistrom's media supplemented with IX LB, 15% 
sucrose, and 0,5% DMSO, and grown anaerobically for 5 days. Potential double- 
crossover colonies were purified on LBDMSO and screened by PCR using the PPSRUPF 
and PPSRDNR primers. Colonies having the truncated ppsR allele were then rescreened 

25 by PCR for the crtE, ppsR, and ccoN loci. These colonies were also plated on 

LBKDMSO and screened by PCR to confirm the loss of the vector from the genome. 
The resulting R. sphaeroides mutant containing the crtE knockout, ppsR knockout, and 
ccoN knockout was designated ATCC 35053/AcrtE/AppsR/AccoN. 
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Example 10 - Makine recombinant microorganisms that 
overexpress a particular sequence while a containing knock-out 
Any construct developed for the overexpression of genes are transferred to any of 
the background genotypes developed by gene knockout techniques. For example, the 
5 pMCS2tetP/Stdxs/Rsdds/EcUbiC or the pMCS2tetP/Stdxs/Rsdds/RsLytB construct is 
transferred into the R. sphaeroides ATCC 35053/AcrtE/AppsR/AccoN mutant cells to 
combine the productive effects of gene overexpression and engineering of gene regulation 
or carbon flow. The construct is transferred to the desired genotype by electroporation or 
conjugation. Conjugation of a plasmid into an R. sphaeroides strain follows the 
10 procedure described for the isolation of single-crossover events except that, since the 

efficiency of plasmid transfer is much higher than that of chromosomal integration, a 0.1- 
1 [iL plating volume from the -400 \iL conjugation recovery is ample to obtain 
transformed colonies. Single colony PCR is used to check the integrity of the construct in 
the new background, and evaluations of the productivity of the new strain are made. 
15 Genes that are productive are integrated, in one or more copies, into appropriate regions 
of the chromosome of a productive strain along with or downstream of a highly- 
expressing promoter. 

Example 1 1 - Three liter fermentations 

20 Cultures of R. sphaeroides ATCC 35053 with various inserted genes or knockouts 

were grown in 5 mL culture tubes containing Sistrom's media with 4 g/L glucose. After 
48 hours of growth at 30°C with 250 rpm shaking, the entire contents of the tube were 
used to inoculate a 300 mL baffled shake flask containing Sistrom's media with 4 g/L 
glucose. After incubation at 30°C for 48 hours, the entire contents of the flask were added 

25 to 2.7 L of Sistrom's media containing 40 g/L glucose in a B. Braun Biotech International 
Model Biostat B fermenter. 

The fermenter was maintained at 30°C, and the cascade was set to maintain the 
dissolved oxygen (DO) at 40%. The air inflow was maintained at 1 wm, and the pH was 
maintained at 7.3 with an automatic feed of 2N NH4OH. Foaming was controlled by 

30 addition of Sigma Antifoam 289. Kanamycin to a concentration of 50 ^ig/mL was added 
to fermentations with strains containing the broad host range vector pBBRIMCS2 either 
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with or without an inserted gene. At 24 to 30 hours, when the agitation increase to 
maintain a DO of 40% had leveled off, the agitation and DO were decoupled, and the 
agitation was fixed at 240 rpm. The air inflow was lowered to 0,3 wm. Kanamycin to 
50 ]4.g/mL was again added to fermentations containing the expression vector. 
5 The fermentation samples for coenzyme Ql 0 and spheroidenone analysis were 

removed at 69 to 75 hours into the fermentation. 

Example 12 - Three-hundred milliliter fermentations 
Cultures of R. sphaeroides ATCC 35053 with various overexpressed genes or 

10 knockouts were grown in 5 mL culture tubes containing Sistrom's media with 4 g/L 
glucose. After 48 hours of growth at 30°C with 250 rpm shaking, the entire contents of 
the tube were used to inoculate a 300 mL baffled shake flask containing Sistrom's media 
with 4 g/L glucose. After incubation at 30°C for 48 hours, 30 mL of the flask were added 
to 270 mL of Sistrom's media containing 40 g/L glucose in a 500 mL Infors AG-CH- 

15 4103 fermenter. 

The fermenter was maintained at 30°C, and the cascade was set to maintain the 
dissolved oxygen (DO) at 40%. The air inflow was maintained at 1 wm, and the pH was 
maintained at 7.3 with an automatic feed of 2N NH4OH. Foaming was controlled by 
addition of Sigma Antifoam 289. Kanamycin to a concentration of 50 pg/mL was added 

20 to fermentations with strains containing the broad host range vector pBBRIMCS2 either 
with or without an inserted gene. At 24 to 30 hours, when the agitation increase to 
maintain a DO of 40% had leveled off, the agitation and DO were decoupled, and the 
agitation was fixed at 400 rpm. The air inflow was lowered to 0.3 wm. Kanamycin to 
50 jig/mL was again added to fermentations containing the expression vector. 

25 The fermentation samples for coenzyme Q10 and spheroidenone analysis were 

removed at 69 to 75 hours into the fermentation. 

Example 13 - Analysis of Spheroidenone 
At various times during the fermentation, 15 mL of fermentation volume was 
30 withdrawn. The volume of sample needed to obtain 5 mg of dry cell weight (DCW) was 
used for spheroidenone analysis. The sample was washed one time in water and 
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resuspended in an equal volume of water. The volume of sample calculated in step 1 was 
added to a 1.8 mL-microfuge tube and was centrifuged at 10,000 rpm for 3 minutes in an' 
IEC MicroMax microfuge. The supernatant was removed, and the pellet was completely 
resuspended in 1.0 mL of Acetone:Methanol (7:2) and stored at room temperature away 
5 from light for 30 minutes. The sample was mixed once during this incubation. After 
incubation, the sample was centrifuged at 10,000 rpm for 3 minutes, and the extract 
(supernatant) collected. Samples were stored -20°C for analysis at a later time. The 
carotenoid extract was analyzed on a spectrophotometer scanning in the range of 350 nm 
to 800 nm, and the OD480 was recorded. The amount of carotenoid in mg/100 mL of 
10 culture was calculated using the following equation: 

Spheroidenone (mg) / 100 mL culture = ((OD 48 o - (0.0816 * OD770)) * 0.484) / Vol. of 
sample from step 1 

1 5 From mg of Spheroidenone/1 00 mL of culture, the amount of Spheroidenone/mg 

of dry cell weight (DCW) was calculated using the DCW number as the conversion 
factor. Care was taken to correct for any dilution factor required while the sample was 
scanned on the spectrophotometer. 

20 Example 14 - Analyzing CoOdO^ levels produced via fermentation 

1 00 mL of fermentation broth was removed once per day and placed in a tared 

250 mL centrifuge bottle. The samples were centrifuged at 15,000 X g for 5 minutes, the 

supernatant was poured off, and the samples were resuspended in 50 mL cold water. The 

samples were centrifuged again at 15,000 X g for 5 minutes, and the supernatant was 
25 poured off. The wet weight of the biomass was determined, and the biomass was 

resuspended in 1.5 times its weight in water. The samples were stored covered with foil 

at -80°C before analysis. 

Before analysis, the samples were warmed at 21°C for 15 minutes. 1 .0 mL was 

withdrawn. Sodium dodecyl sulfate was added to a final concentration of 1.67 %. The 
30 samples were extracted with 14 mL of a hexane:ethanol (5:2) mixture. The samples were 

then evaporated to dryness and dissolved in 2 mL of a methanohethanol (9:2) mixture. 
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The samples were then analyzed on a Waters Nova-Pak C18 (3.9 x 150 mm: 4 Um) 
column with a PDA detector set from 200-300 nm. Resolution was at 12 nm with a 
maximum absorbance at 275 nm. The run time was 15 minutes, and the injection volume 
was 20 joL. 

The dry weight of the samples were determined drying an aliquot at 105°C in an 
aluminum weighing pan for at least four hours. 

Example 1 5 - Production of CoOOO) 
The following seven experiments measured the amount of CoQ(10) produced by 
the indicated microorganisms in a 3 liter scale fermentation. 

In experiment 1, the following data were collected after 96 hours of fermentation: 



Strain 


Coenzyme Q10 (ppm) dry weight basis 


ATCC 35053 


2950 


ATCC 35053/AcrtE 


6508 



These results demonstrated that the inactivation of crtE increased the production of 
CoQ(10). 

In experiment 2, the following data were collected after 69 to 75 hours of 
fermentation: 



Strain 


Coenzyme Q10 (ppm) dry weight basis 


ATCC 35053 


1655 


ATCC 35053/AppsR(strep) 


3812 



These results demonstrated that the inactivation of ppsR increased the production of 
CoQ(10). 
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In experiment 3, the following data were collected after 69 to 75 hours of 



fermentation: 



Strain 


Coenzyme Q10 (ppm) dry 
weight basis 


Spheroidenone (ppm) dry 
weight basis 


ATCC 35053 


2951 


1980 


ATCC 35053/AccoN 


3527 


2959 



These results demonstrated that the inactivation of ccoN increased the production of 
5 CoQ(lO) and spheroidenone. 

In experiment 4, the following data were collected after 69 to 75 hours of 
fermentation: 



Strain 


Coenzyme Q10 (ppm) dry weight basis 


ATCC 35053/AcrtE 


3255 


ATCC 35053/AcrtE/AccoN isolate 8-7 


7951 



These results demonstrated that the inactivation of crtE and ccoN increased the 
10 production of CoQ(lO) as compared to inactivating crtE only. 

In experiment 5, the following data were collected after 69 to 75 hours of 
fermentation: 



Strain 


Coenzyme Ql 0 (ppm) dry weight basis 


ATCC 35053/AcrtE 


3545 


ATCC 35053/AcrtE/AccoN isolate 111 


4984 


ATCC 35053/AcrtE/AppsR/AccoN 


11,676 



These results demonstrated that the inactivation of crtE and ccoN increased the 
1 5 production of CoQ(l 0) as compared to inactivating crtE only. In addition, these results 
demonstrated that the inactivation of crtE, ccoN, and ppsR increased the production of 
CoQ(10) as compared to inactivating only crtE and ccoN. 
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In experiment 6, the following data were collected after 69 to 75 hours of 



fermentation: 



Strain 


Coenzyme Q10 (ppm) dry weight basis 


ATCC 35053/AcrtE 


3833 


ATCC 35053/AcrtE/pMCS2tetP/Stdxs 


4928 


ATCC 35053/AcrtE/pMCS2glnP/Stdxs 


5508 


ATCC 35053/AcrtE/pMCS2tetP/Stdds 


4652 



These results demonstrated that the inactivation of crtE together with the addition of 
5 Stdxs increased the production of CoQ(l 0) as compared to inactivating crtE only. In 

addition, these results demonstrated that the use of the gin promoter with Stdxs resulted in 
more production of CoQ(lO) when compared to the use of the tet promoter with Stdxs. 
Further, these results demonstrated that the inactivation of crtE together with the addition 
of Stdds increased the production of CoQ(lO) as compared to inactivating crtE only. 



10 In experiment 7, the following data were collected after 69 to 75 hours of 

fermentation: 



Strain 


CoQ(lO) (ppm) dry weight basis 


ATCC 35053/pMCS2tetP 


3909 


ATCC 35053/pMCS2tetP/Stdxs/Rsdds 


5387 


ATCC 35053/pMCS2tetP/Stdxs/Rsdds/RsLytB 


5962 


ATCC 35053/pMCS2tetP/Stdxs/Rsdds/EcUbiC 


6439 



These results demonstrated that the addition of Stdxs and Rsdds increased the production 
of CoQ(10) as compared to adding vector only. In addition, these results demonstrated 
1 5 that the addition of either RsLytB or EcUbiC together with the addition of Stdxs and 

Rsdds increased the production of CoQ(10) as compared to adding only Stdxs and Rsdds. 

The following four experiments measured the amount of CoQ(10) produced by 
the indicated microorganisms in a 300 mL scale fermentation. 
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In experiment 1, the following data were collected after 69 to 75 hours of 
fermentation: 



Strain. 


OrtOf 1 (W {rxrxrn \ Ht*v \*7*m dirt Visigic 


ATCC 35053/pMCS2tetP 


5250 


ATCC 35053/pMCS2tetP/Stdxs 


5758 


ATCC 35053/pMCS2tetP/Rsdds 


6944 


ATCC 35053/pMCS2tetP/Stdxs/Rsdds 


6875 


ATCC 35053/pMCS2tetP/Stdxs/Rsdds/EcUbiC 


7808 



These results demonstrated that the addition of either Stdxs or Rsdds increased the 
5 production of CoQ(lO) as compared to adding vector only. In addition, these results 
demonstrated that the addition of Stdxs, Rsdds, and EcUbiC increased the production of 
CoQ(lO) as compared to adding only Stdxs and Rsdds. 

In experiment 2, the following data were collected after 69 to 75 hours of 
fermentation: 



Strain 


CoQ(lO) (ppm) dry weight basis 


ATCC 35053/pMCS2tetP 


5483 


ATCC 35053/pMCS2tetP/EcubiC 


6360 


ATCC 35053/pMCS2tetP/RsLytB 


5976 


ATCC 35053/pMCS2tetP/Stdxs/Rsdds/RsLytB 


6751 



10 



These results demonstrated that the addition of either EcUbiC or RsLytB increased the 
production of CoQ(10) as compared to adding vector only. In addition, these results 
demonstrated that the addition of Stdxs, Rsdds, and RsLytB increased the production of 
CoQ(10) as compared to adding only RsLytB. 



15 In experiment 3, the following data were collected after 69 to 75 hours of 

fermentation: 



Strain 


CoQ(10) (ppm) dry weight basis 


ATCC 35053/pMCS2tetP 


5072 


ATCC 35053/pMCS2tetP/Stdxs/Rsdds/RsLytB 


8050 



134 



WO 02/26933 



PCT/US01/30328 



These results demonstrated that the addition of Stdxs, Rsdds, and RsLytB increased the 
production of CoQ(lO) as compared to adding vector only. 



In experiment 4, the following data were collected after 69 to 75 hours of 
fermentation: 



Strain 


Coenzyme Q10 (ppm) dry weight basis 


ATCC 35053/pMCS2tetP 


4503 


ATCC 35053/pMCS2tetP/Stdxs/Rsdds 


8833 



5 



These results demonstrated that the addition of Stdxs and Rsdds increased the production 
of CoQ(lO) as compared to adding vector only. 

OTHER EMBODIMENTS 

10 It is to be understood that while the invention has been described in conjunction 

with the detailed description thereof, the foregoing description is intended to illustrate and 
not limit the scope of the invention, which is defined by the scope of the appended claims. 
Other aspects, advantages, and modifications are within the scope of the following 
claims. 

15 
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WHAT IS CLAIMED IS: 

1 . An isolated nucleic acid comprising a nucleic acid sequence having a length and a 
percent identity to the sequence set forth in SEQ ID NO:l over said length, wherein the 
5 point defined by said length and said percent identity is within the area defined by points 
A, B, C, and D of Figure 26, wherein point A has coordinates (3626, 1 00), point B has 
coordinates (3626, 65), point C has coordinates (50, 65), and point D has coordinates (12, 
100). 

10 2. The isolated nucleic acid of claim 1 , wherein said point B has coordinates (3626, 
85). 

3. The isolated nucleic acid of claim 1, wherein said point C has coordinates (100, 
65). 

15 

4. The isolated nucleic acid of claim 1, wherein said point C has coordinates (50, 
85). 

5. The isolated nucleic acid of claim 1, wherein said point D has coordinates (15, 
20 100). 

6. The isolated nucleic acid of claim 1, wherein said nucleic acid sequence encodes a 
polypeptide. 

25 7. The isolated nucleic acid of claim 6, wherein said polypeptide has DXS activity. 

8. The isolated nucleic acid of claim 1, wherein said nucleic acid sequence is as set 
forth in SEQ ID NO:l. 

30 9. An isolated nucleic acid comprising a nucleic acid sequence having a length and a 
percent identity to the sequence set forth in SEQ ID NO:2 over said length, wherein the 
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point defined by said length and said percent identity is within the area defined by points 
A, B, C, and D of Figure 26, wherein point A has coordinates (1926, 100), point B has 
coordinates (1926, 65), point C has coordinates (50, 65), and point D has coordinates (12, 
100). 

5 

10. The isolated nucleic acid of claim 9, wherein said nucleic acid sequence encodes a 
polypeptide. 

1 1 . The isolated nucleic acid of claim 1 0, wherein said polypeptide has DXS activity. 

10 

12. An isolated nucleic acid comprising a nucleic acid sequence, wherein said nucleic 
acid sequence encodes a polypeptide comprising an amino acid sequence, wherein said 
amino acid sequence has a length and a percent identity to the sequence set forth in SEQ 
ID NO:3 over said length, wherein the point defined by said length and said percent 

1 5 identity is within the area defined by points A, B, C, and D of Figure 26, wherein point A 
has coordinates (641, 100), point B has coordinates (641, 65), point C has coordinates 
(25, 65), and point D has coordinates (5, 100). 

1 3 . The isolated nucleic acid of claim 12, wherein said polypeptide has DXS activity. 

20 

14. An isolated nucleic acid comprising a nucleic acid sequence having a length and a 
percent identity to the sequence set forth in SEQ ID NO:37 over said length, wherein the 
point defined by said length and said percent identity is within the area defined by points 
A, B, C, and D of Figure 26, wherein point A has coordinates (1990, 100), point B has 

25 coordinates (1990, 65), point C has coordinates (50, 65), and point D has coordinates (16, 
100). 

15. The isolated nucleic acid of claim 14, wherein said point B has coordinates (1990, 
85). 

30 

16. The isolated nucleic acid of claim 14, wherein said point C has coordinates (100, 
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55). 

17. The isolated nucleic acid of claim 14, wherein said point C has coordinates (50, 
85). 

5 

18. The isolated nucleic acid of claim 14, wherein said point D has coordinates (20, 
100). 

19. The isolated nucleic acid of claim 14, wherein said nucleic acid sequence encodes 
10 a polypeptide. 

20. The isolated nucleic acid of claim 19, wherein said polypeptide has DDS activity. 

21. The isolated nucleic acid of claim 14, wherein said nucleic acid sequence is as set 
15 forthinSEQIDNO:37. 

22. An isolated nucleic acid comprising a nucleic acid sequence having a length and a 
percent identity to the sequence set forth in SEQ ID NO:38 over said length, wherein the 
point defined by said length and said percent identity is within the area defined by points 

20 A, B, C, and D of Figure 26, wherein point A has coordinates (1002, 100), point B has 
coordinates (1002, 65), point C has coordinates (50, 65), and point D has coordinates (16, 
100). 

23 . The isolated nucleic acid of claim 22, wherein said nucleic acid sequence encodes 
25 a polypeptide. 

24. The isolated nucleic acid of claim 23, wherein said polypeptide has DDS activity. 

25. An isolated nucleic acid comprising a nucleic acid sequence, wherein said nucleic 
30 acid sequence encodes a polypeptide comprising an amino acid sequence, wherein said 

amino acid sequence has a length and a percent identity to the sequence set forth in SEQ 
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ID NO:39 over said length, wherein the point defined by said length and said percent 
identity is within the area defined by points A, B, C, and D of Figure 26, wherein point A 
has coordinates (333, 100), point B has coordinates (333, 65), point C has coordinates 
(25, 65), and point D has coordinates (5, 100). 

5 

26. The isolated nucleic acid of claim 25, wherein said polypeptide has DDS activity. 

27. An isolated nucleic acid comprising a nucleic acid sequence having a length and a 
percent identity to the sequence set forth in SEQ ID NO:40 over said length, wherein the 

10 point defined by said length and said percent identity is within the area defined by points 
A, B, C, and D of Figure 26, wherein point A has coordinates (1833, 100), point B has 
coordinates (1833, 65), point C has coordinates (50, 65), and point D has coordinates (16, 
100). 

15 28. The isolated nucleic acid of claim 27, wherein said point B has coordinates (1 833, 
85). 

29. The isolated nucleic acid of claim 27, wherein said point C has coordinates (100, 
65). 

20 

30. The isolated nucleic acid of claim 27, wherein said point C has coordinates (50, 
85). 

3 1 . The isolated nucleic acid of claim 27, wherein said point D has coordinates (20, 
25 100). 

32. The isolated nucleic acid of claim 27, wherein said nucleic acid sequence encodes 
a polypeptide. 

30 33 . The isolated nucleic acid of claim 32, wherein said polypeptide has DDS activity. 
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34. The isolated nucleic acid of claim 27, wherein said nucleic acid sequence is as set 
forth in SEQIDNO:40. 

35. An isolated nucleic acid comprising a nucleic acid sequence having a length and a 
5 percent identity to the sequence set forth in SEQ ID NO:41 over said length, wherein the 

point defined by said length and said percent identity is within the area defined by points 
A, B, C, and D of Figure 26, wherein point A has coordinates (1014, 100), point B has 
coordinates (1014, 65), point C has coordinates (50, 65), and point D has coordinates (16, 
100). 

10 

36. The isolated nucleic acid of claim 35, wherein said nucleic acid sequence encodes 
a polypeptide. 

37. The isolated nucleic acid of claim 36, wherein said polypeptide has DDS activity. 

15 

38. An isolated nucleic acid comprising a nucleic acid sequence, wherein said nucleic 
acid sequence encodes a polypeptide comprising an amino acid sequence, wherein said 
amino acid sequence has a length and a percent identity to the sequence set forth in SEQ 
ID NO:42 over said length, wherein the point defined by said length and said percent 

20 identity is within the area defined by points A, B, C, and D of Figure 26, wherein point A 
has coordinates (337, 100), point B has coordinates (337, 65), point C has coordinates 
(25, 65), and point D has coordinates (5, 100). 

39. The isolated nucleic acid of claim 38, wherein said polypeptide has DDS activity. 

25 

40. An isolated nucleic acid comprising a nucleic acid sequence having a length and a 
percent identity to the sequence set forth in SEQ ID NO:95 over said length, wherein the 
point defined by said length and said percent identity is within the area defined by points 
A, B, C, and D of Figure 26, wherein point A has coordinates (2017, 100), point B has 

30 coordinates (2017, 65), point C has coordinates (50, 65), and point D has coordinates (16, 
100). 
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4 1 . The isolated nucleic acid of claim 40, wherein said point B has coordinates (20 1 7, 
85). 

5 42. The isolated nucleic acid of claim 40, wherein said point C has coordinates (1 00, 
65). 

43 . The isolated nucleic acid of claim 40, wherein said point C has coordinates (50, 
85). 

10 

44. The isolated nucleic acid of claim 40, wherein said point D has coordinates (20, 
100). 

45. The isolated nucleic acid of claim 40, wherein said nucleic acid sequence encodes 
15 a polypeptide. 

46. The isolated nucleic acid of claim 45, wherein said polypeptide has DXR activity. 

47. The isolated nucleic acid of claim 40, wherein said nucleic acid sequence is as set 
20 forth in SEQ ID NO:95. 

48. An isolated nucleic acid comprising a nucleic acid sequence having a length and a 
percent identity to the sequence set forth in SEQ ID NO:96 over said length, wherein the 
point defined by said length and said percent identity is within the area defined by points 

25 A, B, C, and D of Figure 26, wherein point A has coordinates (1 161, 100), point B has 
coordinates (1161, 65), point C has coordinates (50, 65), and point D has coordinates (16, 
100). 

49. The isolated nucleic acid of claim 48, wherein said nucleic acid sequence encodes 
30 a polypeptide. 
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50. The isolated nucleic acid of claim 49, wherein said polypeptide has DXR activity. 

51. An isolated nucleic acid comprising a nucleic acid sequence, wherein said nucleic 
acid sequence encodes a polypeptide comprising an amino acid sequence, wherein said 

5 amino acid sequence has a length and a percent identity to the sequence set forth in SEQ 
ID NO:97 over said length, wherein the point defined by said length and said percent 
identity is within the area defined by points A, B, C, and D of Figure 26, wherein point A 
has coordinates (386, 100), point B has coordinates (386, 65), point C has coordinates 
(25, 65), and point D has coordinates (5, 100). 

10 

52. The isolated nucleic acid of claim 5 1 , wherein said polypeptide has DXR activity. 

53. An isolated nucleic acid comprising a nucleic acid sequence of at least 12 
nucleotides, wherein said isolated nucleic acid hybridizes under hybridization conditions 

15 to the sense or antisense strand of a nucleic acid molecule, the sequence of said nucleic 
acid molecule being the sequence set forth in SEQ ID NO: 1, 2, 37, 38, 40, 41, 95, or 96. 

54. The isolated nucleic acid of claim 53, wherein said nucleic acid sequence is at 
least 50 nucleotides. 

20 

55. The isolated nucleic acid of claim 53, wherein said nucleic acid sequence encodes 
a polypeptide. 

56. The isolated nucleic acid of claim 53, wherein said polypeptide has DXS, DDS, or 
25 DXR activity. 

57. A substantially pure polypeptide comprising an amino acid sequence, wherein 
said amino acid sequence has a length and a percent identity to the sequence set forth in 
SEQ ID NO:3 over said length, wherein the point defined by said length and said percent 

30 identity is within the area defined by points A, B, C, and D of Figure 26, wherein point A 
has coordinates (641, 100), point B has coordinates (641, 65), point C has coordinates 
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(25, 65), and point D has coordinates (5, 100). 

58. The substantially pure polypeptide of claim 57, wherein said polypeptide has DXS 
activity. 

5 

59. A substantially pure polypeptide comprising an amino acid sequence, wherein 
said amino acid sequence has a length and a percent identity to the sequence set forth in 
SEQ ID NO:39 over said length, wherein the point defined by said length and said 
percent identity is within the area defined by points A, B, C, and D of Figure 26, wherein 

10 point A has coordinates (333, 100), point B has coordinates (333, 65), point C has 
coordinates (25, 65), and point D has coordinates (5, 100). 

60. The substantially pure polypeptide of claim 59, wherein said polypeptide has DDS 
activity. 

15 

61 . A substantially pure polypeptide comprising an amino acid sequence, wherein 
said amino acid sequence has a length and a percent identity to the sequence set forth in 
SEQ ID NO:42 over said length, wherein the point defined by said length and said 
percent identity is within the area defined by points A, B, C, and D of Figure 26, wherein 

20 point A has coordinates (337, 100), point B has coordinates (337, 65), point C has 
coordinates (25, 65), and point D has coordinates (5, 100). 

62. The substantially pure polypeptide of claim 61, wherein said polypeptide has DDS 
activity. 

25 

63. A substantially pure polypeptide comprising an amino acid sequence, wherein 
said amino acid sequence has a length and a percent identity to the sequence set forth in 
SEQ ID NO:97 over said length, wherein the point defined by said length and said 
percent identity is within the area defined by points A, B, C, and D of Figure 26, wherein 

30 point A has coordinates (386, 100), point B has coordinates (386, 65), point C has 
coordinates (25, 65), and point D has coordinates (5, 100). 



143 



WO 02/26933 



PCT/US01/30328 



64, The substantially pure polypeptide of claim 63, wherein said polypeptide has 
DXR activity. 

5 65. A host cell comprising an isolated nucleic acid of claim 1, 9, 12, 14, 22, 25, 27, 
35,38, 40, 48, 51, or 53. 

66. The host cell of claim 65, wherein said host cell is prokaryotic. 

10 67. The host cell of claim 65, wherein said host cell is selected from the group 
consisting of Rhodobacter, Sphingomonas, and Escherichia cells. 

68. The host cell of claim 65, wherein said host cell comprises an exogenous nucleic 
acid that encodes a polypeptide having DDS, DXS, ODS, SDS, DXR, 4- 

1 5 ■ diphosphocytidyl-2C-methyl-D-erythritol synthase, 4-diphosphocytidyl-2C-methyl-D- 
erythritol kinase, or chorismate lyase activity. 

69. The host cell of claim 65, wherein said host cell comprises an exogenous nucleic 
acid comprising an UbiC sequence or LytB sequence. 

20 

70. The host cell of claim 65, wherein said host cell comprises an exogenous nucleic 
acid comprising an UbiC sequence and LytB sequence. 

71. The host cell of claim 65, wherein said host cell comprises non-functional crtE 
25 sequence, ppsR sequence, or ccoN sequence. 

72. The host cell of claim 65, wherein said host cell comprises non-functional crtE 
sequence, ppsR sequence, and ccoN sequence. 

30 73. A host cell comprising an exogenous nucleic acid and a non-functional crtE 
sequence, ppsR sequence, or ccoN sequence, wherein said exogenous nucleic acid is 
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within a crtE, ppsR, or ccoN locus of said host cell. 

74. A host cell comprising a genomic deletion, wherein said deletion comprises at 
least a portion of a crtE sequence, ppsR sequence, or ccoN sequence, and wherein said 

5 host cell comprises a non-functional crtE sequence, ppsR sequence, or ccoN sequence. 

75. A method for increasing production of CoQ(lO) in a cell having endogenous DDS 
activity, said method comprising inserting a nucleic acid molecule comprising a nucleic 
acid sequence that encodes a polypeptide having DDS activity into said cell such that 

1 0 production of CoQ(l 0) is increased, 

76. The method of claim 75, wherein said nucleic acid molecule comprises an isolated 
nucleic acid of claim 14, 22, 25, 27, 35, 38, or 53. 

1 5 77. The method of claim 75, wherein the production of CoQ(l 0) is increased at least 
about 5 percent as compared to a control cell lacking said inserted nucleic acid molecule. 

78. The method of claim 75, wherein said cell is selected from the group consisting of 
Rhodobacter and Sphingomonas cells. 

20 

79. The method of claim 75, wherein said cell is a membraneous bacterium. 

80. The method of claim 75, wherein said cell is a highly membraneous bacterium. 

25 81. The method of claim 75, wherein said method further comprises inserting a 
second nucleic acid molecule comprising a nucleotide sequence that encodes a 
polypeptide having DXS activity into said cell. 

82. The method of claim 8 1 , wherein said second nucleic acid molecule comprises an 
30 isolated nucleic acid of claim 1, 9, or 12. 
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83. A method for increasing production of CoQ(lO) in a cell having endogenous DDS 
activity, said method comprising inserting a nucleic acid molecule comprising a nucleic 
acid sequence that encodes a polypeptide having DXS activity into said cell such that 
production of CoQ(lO) is increased. 

5 

84. The method of claim 83, wherein the production of CoQ(l 0) is increased at least 
about 5 percent as compared to a control cell lacking said inserted nucleic acid molecule. 

85. The method of claim 83, wherein said cell is selected from the group consisting of 
1 0 Rhodobacter and Sphingomonas cells. 

86. The method of claim 83, wherein said nucleic acid molecule comprises an isolated 
nucleic acid of claim 1, 9, or 12. 

1 5 87. The method of claim 83, wherein said cell is a membraneous bacterium. 

88. The method of claim 83, wherein said cell is a highly membraneous bacterium. 

89. The method of claim 83, wherein said method further comprises inserting a 
20 second nucleic acid molecule comprising a nucleotide sequence that encodes a 

polypeptide having DDS activity into said cell. 

90. The method of claim 89, wherein said second nucleic acid molecule comprises an 
isolated nucleic acid of claim 14, 22, 25, 27, 35, 38, or 53. 

25 

91 . A method for increasing production of CoQ(lO) in a membraneous bacterium, 
said method comprising inserting a nucleic acid molecule comprising a nucleic acid 
sequence that encodes a polypeptide having DDS activity into said bacterium such that 
production of CoQ(lO) is increased. 

30 

92. A method for increasing production of CoQ(l 0) in a highly membraneous 
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bacterium, said method comprising inserting a nucleic acid molecule comprising a nucleic 
acid sequence that encodes a polypeptide having DDS activity into said highly 
membraneous bacterium such that production of CoQ(lO) is increased. 

5 93 . A method for making an isoprenoid, said method comprising culturing a cell 
under conditions wherein said cell produces said isoprenoid, said cell comprising at least 
one exogenous nucleic acid that encodes at least one polypeptide, wherein said cell 
produces more of said isoprenoid than a comparable cell lacking said at least one 
exogenous nucleic acid. 

10 

94. The method of claim 93, wherein said cell is selected from the group consisting of 
Rhodobacter and Sphingomonas cells. 

95 . The method of claim 93 , wherein said isoprenoid is CoQ(l 0). 

15 

96. The method of claim 93, wherein said at least one polypeptide has DDS, DXS, 
ODS, SDS, DXR, 4-diphosphocytidyl-2C»methyl-D-erythritol synthase, 4- 
diphosphocytidyl-2C-methyl-D-erythritol kinase, or chorismate lyase activity. 

20 97. The method of claim 93, wherein said at least one polypeptide is a UbiC 
polypeptide or a LytB polypeptide. 

98. The method of claim 93, wherein said cell comprises a non-functional crtE 
sequence, ppsR sequence, or ccoN sequence. 

25 

99. The method of claim 93, wherein said cell comprises a non-functional crtE 
sequence, ppsR sequence, and ccoN sequence. 

100. The method of claim 93, wherein said cell comprising a genomic deletion, 

30 wherein said deletion comprises at least a portion of a crtE sequence, ppsR sequence, or 
ccoN sequence, and wherein said cell comprises a non-functional crtE sequence, ppsR 
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sequence, or ccoN sequence. 

101 . A method for making an isoprenoid, said method comprising culturing a 
genetically modified cell under conditions wherein said cell produces said isoprenoid. 

1 02. The method of claim 101, wherein said isoprenoid is CoQ(l 0). 

103. The method of claim 101, wherein said cell comprises an exogenous nucleic acid. 

1 04. The method of claim 101, wherein said cell comprises a genomic deletion. 
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Figure 1 
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Figure 2 (page 1 of 2) 

1 ctgcggccag accacgcata tcgacgacga ttcgatcacg aaaaacgtac 

51 ggtccgcagc ccagcacgcc ggtttttcgc cggtccggcc ggtgatcgag 

101 gtgcgcggca agtgcggcaa gtgtgactga cctgtccaac agaccgttcg 

151 acttgagact aacgttgcgc taacaaagcc catggctgac ctacccaaga 

201 cgccgctgct cgacacggtc gacacgccgc aggacctccg gaagctcgcc 

251 cccgcccagc tgcgccagct ggccgacgag cttcgtgccg aaaccatcag 

301 tgcggtgggc tccaccggcg ggcatctagg ctccggcctg ggcgtcgtcg 

351 aactgacggt ggcgatccac tatgtattca acacccccga cgaccggctg 

401 atctgggacg tcgggcacca atgctatccg cacaagatcc tcaccggtcg 

451 gcgcgatcgg atccgcacga ttcgtcaggg tggaggcctc tccggcttca 

501 ccaagcgcag cgagagcgag tatgatccgt tcggtgccgc gcactcgtcg 

551 acctcgatct cggccgcact cggctttgcg atcgccaaca agctcaacga 

601 ggcgccgggc aaggcgatcg cggtgatcgg cgacggcgcg atgagcgcgg 

651 gcatggccta tgaggcgatg aacaacgccg aggccgccgg caaccggctg 

701 gtggtgatcc tcaacgacaa cgacatgtcg atcgccccgc cggtgggcgg 

751 gctttcggcc tatcttgcgc gcctcatttc ctcgtccgaa tatctcggcc 

801 tgcgcgagct cgccaagcgc ttcacccgca agctttcgcg ccgcctcacc 

851 gcggcagccg gcaaggcgga ggaattcgcc cgcggcatgg cgaccggcgg 

901 cacgctgttc gaggaacttg gcttctatta tgtcggcccg atcgacggcc 

951 acaatctcga gcatctgatc ccggtgctgg agaatgtccg cgacagcgag 

1001 cagggcccga tcctgatcca tgtcgtgacc aagaagggca agggctatgc 

1051 cccggccgaa gcggcggcgg acaagtatca cggcgtccag aagttcgacg 

1101 tgatcaccgg ggcacaggcc aaggcacccc cgggcccgcc cgcctatacc 

1151 aaggtgttcg ccgatgcgct gctcgccgaa gcggagcgtg atgcgtcggt 

1201 ctgcgcgatc accgcggcga tgccctcggg caccgggctc gacaagttcc 

1251 aggcgacgtt ccccgatcgc accttcgacg tgggcattgc cgaacagcac 

1301 gcggtcacct tcgcagcggg ccttgccgcg caggggatgc ggccgttctg 

1351 cgcgatctac tcgaccttcc tgcagcgcgc ctacgaccag gtcgtccacg 

1401 acgtcgcgat ccagaacctg ccggtccgct tcgcgatcga ccgcgcgggc 

1451 ctggtcggtg ccgacggcgc gacccatgcc ggcagcttcg acgtgaccta 

1501 tctcgccagc ctgcccaatt tcgtggtgat ggcggccgcg gacgaggtcg 

1551 agctcgtcca catgacccac acggcggcga tgcacgacag cggcccgatc 

1601 gcgctgcgct atccacgcgg caacggcgtc ggactggcgc tgcccaaggt 

1651 tccggagcgg ctggaaatcg gcaagggtcg cgtggtccga gagggcaaga 

1701 aggtagcgat cctgtcgctc ggcacgcgcc ttgcggaagc actaaaggcc 

1751 gccgacacgc tcgaggccaa gggcctctcg accaccgtcg ccgacctgcg 

1801 cttcgccaaa ccgctcgacg aggatctgat ccgccgcctg ctcaccaccc 

1851 acgaagtggc ggtgacgatc gaggaaggcg cgatcggcgg ccccggtgcg 

1901 catgtgctga cgctcgccag cgataccggc ctgatcgacg ccggcctcaa 

1951 gctgcgcacc atgcgcctgc cggacatatt ccaggaccag gacaagcccg 

2001 agaagcagta tgacgaagcg gggctgaacg ccgccaacat cgtcgacacg 

2051 gtgctgaagg cgctccgcta caacgaggcc gagctggccg acggggtgcg 

2101 ggcgtaaacg acgccagatc ctccccggaa cggggagggg aaccgccgcc 

2151 gaaggcggtg gtggaggggc cgctgcggca cgcancggtt tcccaggctg 



SUBSTITUTE SHEET (RULE 26) 



WO 02/26933 



PCT/US01/30328 



3/97 
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2201 agagcgatcc 

2251 atggtccccc 

2301 atctccaaca 

2351 cagactcgct 

2401 gcgcatttcc 

2451 tgcgcaccgc 

2501 ctgatgcgcg 

2551 cgcggccttt 

2601 acgaggaggc 

2651 tcaatcgact 

2701 gtgagcttct 

2751 ccagcacaag 

2801 attcaatcga 

2851 atcgcgatga 

2901 cgaaagtcca 

2951 cgcggtacgt 

3001 gcgctcgcgc 

3051 cgtggtcggg 

3101 aatggaagcc 

3151 caggccgagt 

3201 ccagggcatg 

3251 tccaccgcct 

3301 ctgtggttcg 

3351 ggtcgcgctg 

3401 tggtgaagtc 

3451 ctggcgaccc 

3501 gacgatgctc 

3551 cccgactgac 

3601 ctcttctacg 



gcgccttgcg 

tccccgttcc 

tgcacatgcc 

ccagccgcgt 

gccaatgcgg 

cgccgaacgc 

acctgctcgc 

gcccagctcg 

gtatcgcgcc 

gggatgcgct 

ggaccgatgc 

gtcgagcagg 

gacgcgcgag 

gacgccagcc 

caaaccggca 

gctgaaaatg 

actggctgtt 

ggcattaccc 

aatctccggc 

tcgaccacta 

acgctcggcg 

gctcggccgg 

ccgtccgcaa 

ctcgcgctag 

ggggctcaac 

acctgatgac 

gacctgcgcg 

cgggctcggc 

gggcgctggt 



gcgcgccccc cccaccattc 
ggggaggatc tgggtcctgc 
atgtacatgc acatggctac 
tgtcgtgctg gtatcgcccg 
aagcggcgga catgacggtc 
tataccgagc cgaccgacgc 
ccagctcgaa caggccaatg 
aagctgcgcg cgccgccgcc 
gaggtccgcg aacagctgct 
gtccactgcc ctttccggct 
gctacgcgcg ctccagcagg 
cgctgaccac cgccgaggaa 
cgggtgatcc ggcttgagac 
cgcagcaccg cctacgccgc 
gctagcgccc gcttccccga 
accatccttc ccctcaccgc 
cgtcgtcgcc gcgatgatcg 
ggctcaccga atcgggcctg 
atcgtgcccc cgctcaacga 
caagcagatc ggccagtatg 
ggttcaagag catcttcttc 
ctgatcggca tggtgttcgc 
gcagatcccg cagggctatg 
gcgggctgca gggcgcgttc 
cacacccgca cctcggttag 
cgcactgttc acgctgggcg 
cgcttgccgc caaccatgcc 
gcgggcgtgc tggtactgct 
agcagg (SEQ ID NO:l) 



gctggcgcgg 
cccaccttga 
gcagcttccc 
aggaaaaacg 
agcgacttca 
cgagatggcg 
cccgcacgga 
accgcgttcg 
gaccgatacc 
gggcgcgcca 
tcgcgctgct 
gcccgccgcc 
gctgatcgac 
ctgcgcttcc 
gcgcgtacat 
ccgcccccgc 
tcgcgatggt 
tcgatcaccg 
cgcgcagtgg 
agcagctcaa 
tgggaatata 
gctgccgctg 
gctggcggct 
ggctggtgga 
ccatttctgg 
gcatcgtctg 
gagcgccctg 
ggcggtccag 
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Figure 3 

1 atggctgacc tacccaagac gccgctgctc gacacggtcg acacgccgca 

51 ggacctccgg aagctcgccc ccgcccagct gcgccagctg gccgacgagc 

101 ttcgtgccga aaccatcagt gcggtgggct ccaccggcgg gcatctaggc 

151 tccggcctgg gcgtcgtcga actgacggtg gcgatccact atgtattcaa 

201 cacccccgac gaccggctga tctgggacgt cgggcaccaa tgctatccgc 

251 acaagatcct caccggtcgg cgcgatcgga tccgcacgat tcgtcagggt 

301 ggaggcctct ccggcttcac caagcgcagc gagagcgagt atgatccgtt 

351 cggtgccgcg cactcgtcga cctcgatctc ggccgcactc ggctttgcga 

401 tcgccaacaa gctcaacgag gcgccgggca aggcgatcgc ggtgatcggc 

4 51 gacggcgcga tgagcgcggg catggcctat gaggcgatga acaacgccga 

501 ggccgccggc aaccggctgg tggtgatcct caacgacaac gacatgtcga 

551 tcgccccgcc ggtgggcggg ctttcggcct atcttgcgcg cctcatttcc 

601 tcgtccgaat atctcggcct gcgcgagctc gccaagcgct tcacccgcaa 

651 gctttcgcgc cgcctcaccg cggcagccgg caaggcggag gaattcgccc 

701 gcggcatggc gaccggcggc acgctgttcg aggaacttgg cttctattat 

751 gtcggcccga tcgacggcca caatctcgag catctgatcc cggtgctgga 

801 gaatgtccgc gacagcgagc agggcccgat cctgatccat gtcgtgacca 

851 agaagggcaa gggctatgcc ccggccgaag cggcggcgga caagtatcac 

901 ggcgtccaga agttcgacgt gatcaccggg gcacaggcca aggcaccccc 

951 gggcccgccc gcctatacca aggtgttcgc cgatgcgctg ctcgccgaag 

1001 cggagcgtga tgcgtcggtc tgcgcgatca ccgcggcgat gccctcgggc 

1051 accgggctcg acaagttcca ggcgacgttc cccgatcgca ccttcgacgt 

1101 gggcattgcc gaacagcacg cggtcacctt cgcagcgggc cttgccgcgc 

1151 aggggatgcg gccgttctgc gcgatctact cgaccttcct gcagcgcgcc 

1201 tacgaccagg tcgtccacga cgtcgcgatc cagaacctgc cggtccgctt 

1251 cgcgatcgac cgcgcgggcc tggtcggtgc cgacggcgcg acccatgccg 

1301 gcagcttcga cgtgacctat ctcgccagcc tgcccaattt cgtggtgatg 

1351 gcggccgcgg acgaggtcga gctcgtccac atgacccaca cggcggcgat 

1401 gcacgacagc ggcccgatcg cgctgcgcta tccacgcggc aacggcgtcg 

1451 gactggcgct gcccaaggtt ccggagcggc tggaaatcgg caagggtcgc 

1501 gtggtccgag agggcaagaa ggtagcgatc ctgtcgctcg gcacgcgcct 

1551 tgcggaagca ctaaaggccg ccgacacgct cgaggccaag ggcctctcga 

1601 ccaccgtcgc cgacctgcgc ttcgccaaac cgctcgacga ggatctgatc 

1651 cgccgcctgc tcaccaccca cgaagtggcg gtgacgatcg aggaaggcgc 

1701 gatcggcggc cccggtgcgc atgtgctgac gctcgccagc gataccggcc 

1751 tgatcgacgc cggcctcaag ctgcgcacca tgcgcctgcc ggacatattc 

1801 caggaccagg acaagcccga gaagcagtat gacgaagcgg ggctgaacgc 

1851 cgccaacatc gtcgacacgg tgctgaaggc gctccgctac aacgaggccg 

1901 agctggccga cggggtgcgg gcgtaa (SEQ ID NO: 2) 
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Figure 4 

1 madlpktpll dtvdtpqdlr klapaqlrql adelraetis avgstgghlg 
51 sglgvveltv aihyvfntpd drliwdvghq cyphkiltgr rdrirtirqg 
101 gglsgftkrs eseydpfgaa hsstsisaal gfaianklne apgkaiavig 
151 dgamsagmay eamnnaeaag nrlvvilndn dmsiappvgg lsaylarlis 
201 sseylglrel akrftrklsr rltaaagkae efargmatgg tlfeelgfyy 
251 vgpidghnle hlipvlenvr dseqgpilih wtkkgkgya paeaaadkyh 
301 gvqkfdvitg aqakappgpp aytkvfadal laeaerdasv caitaampsg 
351 tgldkfqatf pdrtfdvgia eqhavtfaag laaqgmrpfc aiystflqra 
401 ydqvvhdvai qnlpvrfaid raglvgadga thagsfdvty laslpnfvvm 
4 51 aaadevelvh mthtaamhds gpialryprg ngvglalpkv perleigkgr 
501 vvregkkvai lslgtrlaea lkaadtleak glsttvadlr fakpldedli 
551 rrllttheva vtieegaigg pgahvltlas dtglidaglk lrtmrlpdif 
601 qdqdkpekqy deaglnaani vdtvlkalry neaeladgvr a (SEQ ID 
NO:3) 
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STdxsdna 182 atg 

CRdxsdna 1 atgctgcgtggtgctgtttctcacggccctgcggtcgccg 

CJdxsdna 1 

PAdxsdna 1 atg 

LEdxsdna 1 atg 

MTdxsdna 1 

RSdxsldna 1 atg 

RSdxs2dna 1 atg 

SPCCdxsdna 1 

ECdxsdna 1 atg 

NMdxsdna 1 

HIdxsdna 1 atg 

SSdxsdna 1 

HPdxsdna 1 



STdxsdna 

CRdxsdna 

CJdxsdna 

PAdxsdna 

LEdxsdna 

MTdxsdna 

RSdxsldna 

RSdxs2dna 

SPCCdxsdna 

ECdxsdna 

NMdxsdna 

HIdxsdna 

SSdxsdna 

HPdxsdna 



185 



-gct- 



41 accgggctgccgct- 

1 

4 

4 



—at 

-cccaagacgctccatgagattccccgc — 
-gctttgtgtgcttatgcatttcctgggat 



1 
4 
4 
1 
4 
1 
4 
1 
1 



■acc- 
-acc- 



-agtttt- 



-act- 



-gt- 



STdxsdna 188 

CRdxsdna 55 

CJdxsdna 3 

PAdxsdna 31 

LEdxsdna 33 tttgaacaggactggtgtggtttcagattcttctaaggca 

MTdxsdna 1 

RSdxsldna 7 

RSdxs2dna 7 

SPCCdxsdna 1 

ECdxsdna 10 

NMdxsdna 1 

HIdxsdna 7 

SSdxsdna 1 

HPdxsdna 3 
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STdxsdna 


loo 


CRdxsdna 


55 


CJdxsdna 


3 


PAdxsdna 


31 


LEdxsdna 


73 


MTdxsdna 


1 


RSdxsldna 


7 


RSdxs2dna 


7 


SPCCdxsdna 


1 


ECdxsdna 


10 


NMdxsdna 


1 


HIdxsdna 


7 


SSdxsdna 


1 


HPdxsdna 


3 



STdxsdna 

CRdxsdna 

CJdxsdna 

PAdxsdna 

LEdxsdna 

MTdxsdna 

RSdxsldna 

RSdxs2dna 

SPCCdxsdna 

ECdxsdna 

NMdxsdna 

HIdxsdna 

SSdxsdna 

HPdxsdna 



-gacc 

-ggcc 

-ga-g— - 
-gage 



-gaca 

-aatc 



-gata 

— atg 

-aacaata 



192 t- 
59 c- 
6 t- 

g- 

t- 



gatt 

ac cc 

eg cccgctgcgctgctcccg 

aa aa 

cc cc 

gcagtttttgttcc 



35 
110 

1 

11 g ac 

11 ccaccccgcgac 

1 

14 

4 
14 

1 

7 



*cc- 
■cc- 



-tg- 
■ac- 
-ga- 



•cc- 
■cc- 
•ac- 



-tg- 



-ca- 



STdxsdna 

CRdxsdna 

CJdxsdna 

PAdxsdna 

LEdxsdna 

MTdxsdna 

RSdxsldna 

RSdxs2dna 

SPCCdxsdna 

ECdxsdna 

NMdxsdna 

HIdxsdna 

SSdxsdna 

HPdxsdna 



197 

80 tcgcccgtggtgtgcgcagcgcagcgcccacgcgtcagcg 
H 

40 

125 

I 

16 

25 

I 

19 

9 

19 

1 

12 
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STdxsdna 197 

CRdxsdna 120 tcgcgcggaggcttcggtcaatgccccgcgggcgggcccg 

CJdxsdna 11 

PAdxsdna 40 

LEdxsdna 125 

MTdxsdna 1 

RSdxsldna 16 

RSdxs2dna 25 

SPCCdxsdna 1 

ECdxsdna 19 

NMdxsdna 9 

HIdxsdna 19 

SSdxsdna 1 

HPdxsdna- 12 

STdxsdna 197 

CRdxsdna 160 gccggtagctactcgggcgagtgggataagctttcagtgg 

CJdxsdna 11 

PAdxsdna 40 

LEdxsdna 125 

MTdxsdna 1 

RSdxsldna 16 

RSdxs2dna 25 

SPCCdxsdna 1 

ECdxsdna 19 

NMdxsdna 9 

HIdxsdna 19 

SSdxsdna 1 ■ 

HPdxsdna 12 



STdxsdna 197 aag — acg 

CRdxsdna 200 aggagattgatgagtggcgcgatgtgggcccgaag — acg 

CJdxsdna 11 aat--ttg 

PAdxsdna 40 gcc — acg 

LEdxsdna 125 aac— aca 

MTdxsdna 1 

RSdxsldna 16 tgc — acg 

RSdxs2dna 25 gaa — acc 

SPCCdxsdna 1 atg 

ECdxsdna 19 aaa--tac 

NMdxsdna 9 aag c 

HIdxsdna 19 aat — tat 

SSdxsdna 1 gtg 

HPdxsdna 12 aaataaaa 
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STdxsdna 

CRdxsdna 

CJdxsdna 

PAdxsdna 

LEdxsdna 

MTdxsdna 

RSdxsldna 

RSdxs2dna 

SPCCdxsdna 

ECdxsdna 

NMdxsdna 

HIdxsdna 

SSdxsdna 

HPdxsdna 

STdxsdna 

CRdxsdna 

CJdxsdna 

PAdxsdna 

LEdxsdna 

MTdxsdna 

RSdxsldna 

RSdxs2dna 

SPCCdxsdna 

ECdxsdna 

NMdxsdna 

HIdxsdna 

SSdxsdna 

HPdxsdna 

STdxsdna 

CRdxsdna 

CJdxsdna 

PAdxsdna 

LEdxsdna 

MTdxsdna 

RSdxsldna 

RSdxs2dna 

SPCCdxsdna 

ECdxsdna 

NMdxsdna 

HIdxsdna 

SSdxsdna 

HPdxsdna 



203 ccgctgctc- 
238 cccctgctg- 
17 cccatactc- 
46 cccctgctc- 



-gacacggtcga- 
-gacactgtcaa- 

-aa 

-gaccgcgcctc- 



-ca 
-tt 



-tt 



131 age t tact cat gaggtcaagaaaaggtcacgtgtggttca 



1 atgctg- 

22 ccgacgctc- 

31 cegcttttg- 

4 catctcagc- 

25 ccgaccctg- 

13 cccctactc- 

25 cctctttta- 

4 acgattctg- 

20 cttttgatt- 



-caacagatccg- 
-gae-egggtga- 
-gatcgegtctg- 

-gaaa ttac- 

-gcactggtcga- 
-gacctgattga- 
-tctttaattaa- 
-gagaacatccg- 
-taaaccctaac- 



■cg 
■eg 
-ct 
-cc 
-ct 
-ca 
-tt 
-gc 

-ga 



225 
260 
28 
68 
171 
20 
43 
53 
23 
47 
35 
47 
26 
42 



cgee 
accc 



cgee 
ggct 
ggee 
ctcc 
gece 
atcc 
ccac 
gece 
ctcc 
aacc 
tatt 



-gcaggacc teegga ag 

ggtgcacc tgaaga ac 

-gaagagt tagaaa ag 

-ggecgaac tgegee gg 

-tccttatcagaatctggagaatactacacacagag 

-egctgate tgeage ac 

cggtggaca taaagg gc 

-ggecgaca tgaagg eg 

-caaccagc tccacg gg ' 

ccaggagt tacgac tg 

-gcaagatt tgegee gt 

-agaagatt tgcgtc tt 

-acgcgacc tgaagg eg 

-gcagg- cttgg ag 



245 ctcgcccccgcccagctgcgccag 

280 ttcaacaatgagcagctgaagcag 

43 ctaagtttaaaagaattagaaaat 

88 ctgggcgaggcggacctggaaacc 

210 accgccaacgcctattttggacactgtgaactatcccatt 

,40 ctttcccaggcgcagcttcgggag 

64 ctcacggaccgtgagttgcgctcg 

7 3 ctgagtgacgccgaactggagcgg 

4 3 ttgtcggttgctcagcttgagcaa 

67 ttgccgaaagagagtttaccgaaa 

55 ctggacaaaaaacagctgccgcgc 

67 ttaaataaagatcagctaccacaa 

4 6 ctgcccgaggagcagctgcacgaa 

58 tt ggtgtgtcaa 
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2 69 ctgg 

304 ctct 

67 ttag 

112 ctgg 

250 catatgaaaaatctgtctctgaaggaacttaaacaactag 

64 ctgg 

88 ctgg 

97 ctgg 

67 at tg 

91 ctct 

79 „_ cttg 

91 ctct 

7 0 ctgt 

70 acg- 

273 ccgacgagcttcgtgccgaaacca-tcagtg — cggtggg 
308 gcaaggagctgcgcagtgacatcg-tgcaca — ccgtctc 
71 cagcatctatgcgtgaaaaaatca-tacaag — ttgtgag 
116 ccgacgagct--gcgccagtacct-gctgtataccgtcgg 
2 90 cagatgaactaaggtcagatacaa-ttttca--atgtatc 
68 ccgccgagatccgtgagttcctga-tccaca — aggttgc 
92 ccgacgagctgcgggccgaaacga-tctcgg — ccgtgtc 

101 ccgacgaagtgcgttccgaggtga-tttcgg — tcgttgc 
71 gccaccagattcgtgagaagcacc-tgcaga--cggttgc 
95 gcgacgaactgcgccgctatttac-tcgaca — gcgtgag 
83 ccggcgagttgcgcacctttctgc-tggaat — ctgtcgg 
95 gtcaagaattacgtgcttatcttt-tagaat — ctgttag 
74 ccgaggaga-tcaggcagttcctggtgcacg — cggtcac 
73 -ctacg-gaatcgt attt-tagaag — tggtgag 

310 ctccaccggcgggcatctaggctccggcctgggcgtcgtc 

345 tcgcaccggtggacaccttagcagcagcctgggcgtggtg 

108 taaaaatggtgggcatttaagttcaaatttgggtgctgta 

153 ccagaccggcggtcatttcggcgccggcctcggcgtggtc 

327 aaagactgggggtcaccttggctcaagtcttggtgttgtt 

105 cgccacgggggggcatctggggccgaacctgggagtggtg 

129 ggtgacgggcgggcatctgggcgcaggcctcggcgtggtg 

138 cgagacgggaggacatctggggtcctcgctgggggtggtt 

108 agcgaccggtgggcacctcgggccgggcttgggcgtggtg 

132 ccgttccagcgggcacttcgcctccgggctgggcacggtc 

120 gcagaccggcgggcatttcgccagcaatttgggcgcggtc 

132 tcaaactagcggacatttagcgtcaggtttaggcactgta 

111 cagaaccggcggtcatctgggacccaacctgggggtggtg 

102 cgctaatggggggcatttaagctcttctttaggggctgtg 
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350 gaactgacggtggcgatccactatgtattcaacacccccg 
385 gagctgacggtggctatgcactatgtattcaacaccccgg 
148 gaacttagtatagcaatgcatttggtttttgatgcaaaaa 
193 gagctgaccattgccctgcactacgtcttcgacactccgg 

3 67 gagctgactgttgctcttcattatgtcttcaatgcaccgc 
145 gaactcaccttggcgctgcaccgggtattcgactcgccgc 
169 gagttgacggttgcgctgcatgcgatcttcgatgcgcccc 
178 gagctgactgtcgcgctgcatgcggtcttcaacacgccca 
14 8 gaattgaccctagcgctttaccaaacgctcgatctcgatc 
172 gaactgaccgtggcgctgcactatgtctacaacaccccgt 
160 gagctgacggttgcgctgcactacgtttacaacacgcccg 
172 gagctaaccgttgcgctgcattatgtatataagacgccat 
151 gagctgaccatcgccctgcaccgggtcttcgagtcgcccg 
142 gagctgattgtgggcatgcatgccttatttgattgccaaa 

390 acgaccggctgatctgggacgtcgggcaccaatgctatcc 
425 aggacaagattatttgggacgtgggccaccaggcgtatgg 
188 aagatccttttatttttgatgtgtcgcatcagtcttatac 
233 acgaccgcctggtctgggacgtcggccaccaggcctatcc 

4 07 aagataggattctctgggatgttggtcatcagtcttatcc 
185 acgatccgatcatcttcgacaccggtcaccaggcctacgt 
2 09 gcgacaagatcatctgggacgtgggccaccagtgctaccc 
218 ccgacaagctcgtctgggacgtgggccaccagtgctaccc 
188 gcgacaaagtggtttgggacgttggccaccaagcctatcc 
212 ttgaccaattgatttgggatgtggggcatcaggcttatcc 
200 aagacaagctggtgtgggatgtcggacaccaaagctatcc 
212 ttgatcagttaatttgggatgtgggacatcaagcttatcc 
191 tcgaccgcatcctgtgggacaccggccaccagagctacgt 
182 aaaaccctttcatttttgacacttcgcaccaagcttacgc 

4 30 gcacaagatcctcaccggtcggcgcgatcgga tccgc 

4 65 ccacaagatcctgactggccgtcgcaagggta tggcc 

228 acacaagcttttaagcggaaaagaagaaatat ttgat 

273 gcacaagatcctcaccgagcgccgcgagctga tgggc 

4 47 t cacaaaatcttgactggtagaagggacaaga tgtcg 

225 ccacaagatgttgaccggacgcagccaggact tcgca 

2 49 ccacaagatcctgaccgggcggcgcgaccgca tccgc 

258 ccacaagatcctcaccggccggcgcgagcaga tgcgc 

228 ccacaagctgctgacag ggcgctatcacaacttccat 

252 gcataaaattttgaccggacgccgcgacaaaa tcggc 

240 gcacaaaattcttaccggacgtaaaaaccaga tgcac 

252 acataaaatcctaacgggtcgccgagagcaaa tgtcc 

231 acacaagctgctgacgggacgtcagga ct tctcc 

222 ccacaagcttttaaccgggcgctttgaaagct ttagc 



WO 02/26933 



PCTAJS01/30328 



12/97 



Figure 5 (page 7 of 25) 



STdxsdna 

CRdxsdna 

CJdxsdna 

PAdxsdna 

LEdxsdna 

MTdxsdna 

RSdxsldna 

RSdxs2dna 

SPCCdxsdna 

ECdxsdna 

NMdxsdna 

HIdxsdna 

SSdxsdna 

HPdxsdna 



4 67 acgattcgtcagggtggaggcctctccggcttcaccaag- 
502 acgattcgccagaccaacggcctttcgggcttcacgaag- 
2 65 actttaagacaaatcaatggtttaagtggttatacaaaa- 
310 accctgcgccagaagaacggcctggcggccttcccgcgc- 
4 84 acattaaggcagacagatggtcttgcaggatttactaag- 
2 62 accctgcgtaagaagggcgggttgtcggggtatccgtct- 
28 6 accctgcggcagggcgggggtctctcgggcttcaccaag- 
2 95 accctgcgccagaagggcggcctctcgggcttcaccaag- 
2 65 accttgcggcaaaaggatggcattgcgggctacccgaag- 
28 9 accatccgtcagaaaggcggtctgcacccgttcccgtgg- 
277 accatgcgccaatatggcggtttggcgggttttccgaaa- 
289 acaattcgccaaaaagacggtat-tcatccttttccttgg 
2 65 aagctgcgcggcaagggcggcctgtccggctacccctcg- 
2 59 actttaaggcaattcaagggtttgagcggctttactaaa- 
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506 cgcagcgagagcgagtatgatccgttcggtgccgcgc-ac 
541 cgcgacgagagcgagtacgaccctttcggcgctggcc-ac 

304 cctagcgagggagattat tttgtagcagggc-at 

34 9 cgcgcagagagcgagtacgacaccttcggcgtcggcc-ac 
523 cgatcggagagtgaatatgattgctttggcaccggcc-ac 
301 cgtgccgagagcgagcacga-ctgggtggagtcgagccac 
325 cgctccgagagcccctatgactgtttcggcgcgggcc-at 
334 cgctcggaatccgcctacgacccgttcggcgcggctc-at 
304 cgcacggaaaaccgcttcgatcatttcggtgccggtc-ac 
328 cgcggcgaaagcgaatatgacgtattaagcgtcgggc-at 
316 cgttgcgagtccgagtacgacgcgttcggcgtggggc-at 
328 cgtgaagaaagtgaatttgatgtattaagtgttggtc-ac 
304 cgcgaggagtccgagcacgacgtcatcgagaacagcc-ac 
298 cccagcgagagcgcatacgattatttcatcgccgggc-at 



STdxsdna 

CRdxsdna 

CJdxsdna 

PAdxsdna 

LEdxsdna 

MTdxsdna 

RSdxsldna 

RSdxs2dna 

SPCCdxsdna 

ECdxsdna 

NMdxsdna 

HIdxsdna 

SSdxsdna 

HPdxsdna 



545 tcgtcgacctcgatctcggccgcact 
580 agctccacctcgatttcggcggctct 
337 tctagtacctctatttctttggcagt 
388 tccagcacctccatcagcgccgccct 
562 agttccaccaccatctcagcaggcct 
340 gccagcgcggcgctgtcgtacgcgga 
364 tcctcgacctcgatctcggccgcggt 
373 tcctcgacctcgatctcggccgcgct 
343 gcttccaccagtatttctgctggcct 
3 67 tcatcaacctccatcagtgccggaat 
355 tcctccacctccatcggcgcggcgtt 
3 67 tcctctacgtctattagtgcgggatt 
343 gcctccac — cgccctcggctgggcc 
337 agttccacttcggtgt ctat 



— cggctttgcgat 
— gggtatggcggt 
— aggtgcttgtaa 
— gggcatggccat 
— agggatggctgt 
— cgggttggccaa 
— gggctttgccgc 
-cggctttgccat 
— cggtatggctct 
— tggtattgcggt 
— gggcatggcggc 
— aggcattgccgt 
gacggactcgccaa 
--aggcgttggggt 
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583 cgc-c- 
618 ggg-c- 
375 ggc-t- 
426 cgc-c- 
600 tgg-t- 
378 ggc-g- 
402 ggc-a- 
411 ggg-t- 
381 agcac- 



-aacaagctc- 
-cgcgacgtt- 
-attgcttta- 
-gcccgcctg- 
-agagatcta- 
-ttcgagttg- 
-cgcgagatg- 
-cgcgagctg- 
-gggatgccc- 



-aacgag-gc-- 
-aagggc-aa*- 
-aagggt-ga — 
-caaggc-aa-- 
-aaagga-ag — 
-accg-g-ac — 
-ggcggc~ga — 
-ggccag-cc— 
-agggcg-aa — 



4 05 tgc-tgccgaaaaagaaggca aa — aa-tc — 

393 ggc-g gacaaacag ttgggcagc — 

4 05 tgc-c gcag aacgag-aaaa 

381 ggc-c cgccgggtg cagggg-ga — 

369 ggc-t a — aagctttttgtttgaaacaa-gc — 



604 -gccgg — gcaaggc- 
639 -gaaga — acagtgt- 
396 -aaagc — gtattcc- 
4 47 -ggagc — gtaagtc- 
621 -aaaca — acaatgt- 
398 -accgc--aaccggca 
423 -cacgg — gcgacgc- 
4 32 -cgtgg--gcgacac- 
4 03 -gacta — ccgatgt- 

431 -gcc gca c- 

415 -gaccg — ccgcagc 
423 tgcaggtagaaaaac- 
4 02 -gaagg — gccatgt- 
396 -gctag — gcatgcc- 



gatcgcggtgatcggcgacgg 

gatcgctgtcatcggcgacgg 

tgttgctttgattggagatgg 

ggtggccgtgatcggcgacgg 

tattgccgtaataggtgatgg 

tgtggtcgcggtggtcggtgacgg 

ggtggcggtgatcggcgacgg 

gatcgccgtgatcggcgacgg 

g-tcgctgtgattggtgatgg 

cgtctgtgtcattggcgatgg 

-g-tcgccatcatcggcgacgg 
-agtatgcgtaatcggtgatgg 

cgtcgccgtcatcggcggacg 

— catagctttattaggcgatgg 



637 cgcgatgagcgcgggcatggcctatgaggcgatgaacaac 

672 cgccatcaccgggggtatggcctatgaggccatgaaccat 

429 tgctttaagtgcgggtatggcctatgaggctttaaatgaa 

4 80 tgcgctgaccgccggcatggccttcgaggcactcaaccac 

654 tgccatgacagcaggtcaagcttatgaagccatgaataat 

435 tgcgctcaccggcggtatgtgctgggaggcgctgaacaat 

456 ctcgatgtcggccggcatggccttcgaggcgctgaaccac 

4 65 ctccatcaccgcgggcatggcctacgaggcactgaaccac 

435 atcgctcaccggtggcatggccttggaagccatcaaccac 

459 cgcgattaccgcaggcatggcgtttgaagcgatgaatcac 

447 cgcgatgacggcgggtcaggcgtttgaagccttgaactgc 

459 cgcaattactgcgggaatggcatttgaggcattaaatcac 

435 ggcgctgaccggcggcatggcctgggaggccctgaacaac 

429 gagcattagtgcagggattttttatgaagccttaaacga- 



fil 1RRTIT1 ITF QHFFT /PI l| p ? fi) 



WO 02/26933 



PCT/US01/30328 



14/97 

Figure 5 (page 9 of 25) 



o i axsana 


D / / 


gccgaggcc — gccgg- 


— caa — c-egge 


— t— gg 


T"^ - • >s <v>i 

LKflxsana 


/ ± £ 


gcgggcttc — ctgga- 


— caa — g-aaca 


1 — ga 


CJdxsdna 


469 


ttgggtgat — tctaa- 


--att--t-cctt 


—g—cg 


PAdxsdna 


520 


gcctcggaa — gtcga- 


— cgc--c-gaca 


1 — gc 


LEdxsdna 


d y 


gc^--tggtt — acctg- 


— gac — t-ctgaca-- 


1 — ga 


MTdxsdna 


H 1 D 


ate gec — gcatc- 


— ccg — c-egge 


— -c— gg 


Kouxs xcina 




ggegggcac — ctgaa- 


--gaa — c-eggg 


1 — ga 


KoQXS iUna 


r n ^ 


gc — gggee — atctgaacaa--g-cgcc 


— t— gt 


C DfrHvcrln a 


47 S 


gctggtcacttgccca* 


— aaa — caegge 


1 — gt 




4 QQ 


— geg—ggega- 


--tat — c-cgtcctgatat — gc 


LN L V JUA o Ul 1 d 


487 


gc — gggcg— atatg- 


— gat — g-tgga 


tttgc 




4 99 
i ^ _/ 


geggggge attg- 


— cat — a-caga 


tatgt 


O O UA o Kj.Ii a 


4 7 S 


atcgcggcc--gccaa- 


--gga — c-cagc 


c — gc 


U "D <™1 V Vl a 


4 

1 u o 


-actgggcg — atagg- 


— aaatac-ccca 


1 — ga 


STdxsdna 


/ U -6 


tggtgatcct c — 


-aacgacaac-gaca- 


— tgtcga 


LHdxsana 


( J f 


ttgtgattct g — 


-aacgacaac-cagcaggtgtcgc 


CJdxsdna 


494 


taatactttt a — 


-aatgataat-gaaa- 


— tgagta 


PAdxsdna 


545 


tggtgatcct c-- 


-aacgacaac-gaca- 


— tgtcga 


jjii»QxsQna 


/J.!? 


ttgttatctt a— 


-aacgacaatagaca- 


— agtttc 


ixi i axsana 


497 


tgattatcgtggtc — 


-aacgacaat-gggc- 


— gcagct 


Kouxs iana 


521 


tegtgatect g — 


-aacgacaac-gaga- 


— tgagca 


Ko Ua 5 Z QI1 d 


530 


tegtgatect g — 


-aacgacaat-gaca- 


— tgagca 


^ PPPH yqHna 


503 


tggtcgtgct c — 


-aacgacaat-gaca- 


— tgtcga 


RCdxsdna 


524 


tggtgattct c — 


-aacgacaat-gaaa- 


--tgtcga 


NMdxsdna 


512 


tggtegtect c-- 


-aacgacaac-gaaa- 


— tgtcga 


HXdxsdna 


524 


tagttatttt a — 


-aatgataac-gaaa- 


— tgtcta 


SSdxsdna 


500 


tgatcatcgt cgtcaacgacaac-gagc- 


— gctcct 


HPdxsdna 


494 


tcatgatttt a — 


-aacgataat-gaaa- 


— tgagta 


STdxsdna 


732 






gt 


CRdxsdna 


77 0 tgcccacgcagtacaacaacaagaaccaggaccccgt 


r\Tdx sdna 


524 






at 


PAdxsdna 


575 






— - gt 


LEdxsdna 


750 








MTdxsdna 


530 








RSdxsldna 


551 






■ gt 


RSdxs2dna 


560 






• gt — 


SPCCdxsdna 


533 






— gt 


ECdxsdna 


554 






gt 


NMdxsdna 


542 






— gt— 


HIdxsdna 


554 






— gt— 


SSdxsdna 


533 








HPdxsdna 


524 









SUBSTITUTE SHEET (RULE 26) 



WO 02/26933 



PCTAJS01/30328 



15/97 



Figure 5 (page 10 of 25) 



STdxsdna 

CRdxsdna 

CJdxsdna 

PAdxsdna 

LEdxsdna 

MTdxsdna 

RSdxsldna 

RSdxs2dna 

SPCCdxsdna 

ECdxsdna 

NMdxsdna 

HIdxsdna 

SSdxsdna 

HPdxsdna 



7 45 — gggcgggctttcggcctatcttgcgcgcctcatttcct 
807 — gggcgccctgtccagcgccctggcgcgcctgcaggcca 
537 --tggagcaatttcaaagtatctttctcaggctatggcaa 
58 8 --cggcgggctctccaactacctggcgaagatcctctcca 
7 66 ctggatgggccagttgctcctgttggagctctaagtagtg 

543 --cgggggcgtcgccgaccatctggccacgctg 

564 — gggggcgctgtcgtcctat ct ctcgcggctc-tatgcg 

57 3 — gggggcgcttgcgcgctatctcgtgaatctc tcct 

54 6 --gggtgcgctctctcgctatct gaataagattcg 

5 67 --cggcgcgctcaacaaccatctggcacagctgctttcc- 

555 --cggtgcgttgcccaaataccttgccagc aacgt 

567 --tggtgcattaaataatcatcttgcgcg tattttct 

54 6 --cggcggcctcgccaaccacctggccaccctgcgcacca 
537 — tggagccttatccaaagcccttagccagctga — tgaa 
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783 c — gtc cga-ata 1 

8 45 a--ccg gcc-cct g 

57 5 c — gca gtt-tta 1 

626 g — ccg cac-cta 1 

806 c--tttgagcaggttacagtcta-ataggcct : 

574 c — ggc tgc-a 

601 g--gcg cgc-cgt 1 

608 c — gaa ggc-gcc c 

579 g — gtt ag 



604 g — gta- 
588 



-agc-ttt- 



c — gtg cgcgata tg 

602 ctggct ctc-ttt a 

58 4 c--cga cgg-cta cgagaaggt 

57 3 a- -ggc ccg-ttt 1 
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SPCCdxsdna 585 

ECdxsdna 615 

NMdxsdna 601 

HIdxsdna 615 

SSdxsdna 603 

HPdxsdna 584 



-ctcggc c — tgc-gcga-gc tcgcc 

-cgcgag c — tgc-gcga-ga ttgcc 

-caaagt 1 — tta-aaaa-gcgtattgct 

-agcagc a — tgc-gcga-gg gcagc 

-ctcagagaac — taa-gaga-ag tcgca 

-gccggc c — tac-gag c 

-ccagga c — ttc-aaggcgg ccgcc 

-ttcgccacgc — tgc-gcgc-gg ccgcc 

— tgagc cgatgc-agtt-gc tcacc 

-ctcttca—c — tgc-gcga 

-cacgga c — tgttgagt-ac cgtca 

-ctctacg — c — ttc-gtga-tg- 

cctcgcc t — ggg-gcaa-gg- 

-accagt c — ttt-ccgc-to 
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916 acttggcttctattatgtcggcccgatcgacggccacaat 
98 4 gctgggcctgtactacatcggccctgtggacggccacaac 
7 05 attagggcttgaatatatagggcctattgatggacataat 
7 56 gctcggctggaattacat cggcccgatcgacggccacgac 
966 acttggactttactatattggtcctgtggatggtcacaac 

723 cctcgggttgaagtacgtcggcccggtcgacggcca 1 

7 35 gctgggtttctcctatgtcggcccgatcgacgggcacgat 
7 4 4 gctgggcttcacctatgtcggccccatcgacggccacgac 
7 32 gctgggctt cacctacatggggccagtggatggtcacaac 
735 gctgggctt taactacatcggcccggtggacggtcacgat 
7 53 cttcggcttccgctataccggccccgtggacggacacaac 
7 41 actcggttttaactatattggcccagtggatgggcataac 
7 2 6 cctgggcctgaagtacgtcggccccatcgacgggcacgac 
705 attaggcattaactatatagggcctattaatgggc 

95 6 ctcgagcatctgatcccggtgctggagaatgtcc-g 

102 4 ctggacgacctcatcgccgtgctcagcgaggtgc-g 

7 45 ttaggtgaaattat ttctgcattaaaacaag 

7 96 ctgccgaccctggtggctaccctgcgcaacatgc-g 

100 6 attgatgatctaattgcgattctcaaagaggtta-gaagt 

7 60 gacgag cgggcggtggaggtcgcgc-t 

77 5 ctcgaccagcttctgccggtgctgcggaccgtca-a 

7 84 atggaggcgct cctccagacgctgcgcgcggcgc-g 

772 cttgaagaactgatc gccaccttcc-g 

77 5 gtgctggggcttatcaccacgctaaagaacatgc-g 

7 93 gtcgaaaatctggtcgatgtattggaagacctgc-g 

7 81 attgatgaattagtggctacgcttacgaatatgc-g 

7 66 atcggcgcggtcgagtccgcgctgc gcc-g 

740 atgatttgagcgcgattattgaaaccttaa-a 

991 -c gaca gcga-gc a g ggc 

1059 -c agcg ccga-ga ccgtg ggc 

776 -c aaaa gctatgc a a aag 

831 -c gaca t-ga a g ggc 

1045 ac taaa ac-a-ac a g gtc 

7 86 -g cgca gcgc-gc g g cgcttc 

810 -g — cage gggc-gc a 1 gcg 

819 -g — gece g-ga-cc ac — g ggg 

7 98 -c ga-a gegc-ac a aacacaccgga 

810 -c ga cct-ga a a ggc 

828 -c ggac gc a a a ggc 

816 ta atct-ga a a ggc 

795 -c gccaagcgctt-cc a c ggg 

771 -attageca aaga-gcttaa a gag 
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ccgatcctgatccatgtcgtgaccaagaagggca 

ccggtgctggtgcacgtggtaacggagaagggcc 

ccttgtgtgatacatgctcaaaccataaagggta 

ccgcagttcctccatgtggtgaccaagaaaggca 

cag-tactgatccatgttgtcactgagaaaggca 

ggtgcaccggtgatcgtgcacgtcgtcacccgcaagggca 

ccggtgctgatccatgtcatcaccaagaagggca 

ccggtgctcatccatgtggtcacgaagaagggca 

ccagtactcgtccacgttgccacaaccaagggta 

ccgcagttcctgcatatcatgaccaaaaaaggtc 

ccgcagcttctgcacgtcatcaccaaaaagggca 

ccacaatttttgcatataaaaacgaaaaaaggta 

ccggtgctggtgcactgcctcaccgtcaagggcc 

ccggtgctaatccatgcgcaaaccttaaagggca 



agggctatgccccggccgaagcg gcggcggacaagta 

gcggctacctgcccgccgagacg gcgcaggacaagat 

aaggctatgctttagctgaagga aaacatgctaaatg 

agggcttcgccccggccgaactg gatccgatcggcta 

gaggttatccatatgctgagaga gctgcagataagta 

tgggctacccgccggccga ggccgac 

ggggctatgctccggccgaggcc gcgcgcgaccgtgg 

agggttacgcccccgccgagaat gcccccgacaagta 

agggctatccctacgctgaagaa gatcaggttggcta 

gtggttatgaaccggcagaaaaa gacccgatcacttt 

acggctacaaactcgccgaaaac- — gatcccgtcaaata 

aaggatacgcacccgcagaaaaa gatccgattggttt 

gcggctacgaacccgccctcgcccacgaggaggaccactt 
aaggctataagatcgctgaaggg cgctatgaaaaatg 

tcacggcgtccagaag--tt — cgacgt gatc-acc 

gcacggtgtggtcaag — tt — cgaccc ccgc-acc 

gcacggggtgggagcc — tt — tgatat agat-agt 

ccacgcgatcaccaag — ct--gga age- tec 

tcatggagttgccaag--tt — tgatcc agca-aca 

-caggccgagcagatgcatt — ccacggtcccgatcgatc 

ccatgccacgaacaag--tt — caacgt cctg-acc 

tcacggggtgaacaag — .tt — cgaccc cgtc-acg 

tcatgcccaaaatccc — tt — tgatct ggeg-aca 

ccacgccgtgcctaaa — tt — tgatcc ctcc-agc 

ccacgccgtcgccaac — ctgcctaaag aaag-ege 

ccacggtgtacctaaa — tt — tgatcc aatc-agt 

ccacaccgtcggcgtg — at--ggaccc gctc-acc 

gcatggggtggggcct — tt — tgattt ggat-acc 
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egg tctg-c — gcg — atcaccgcggcgatgccctc 

gca tegt-g — gcg — gtgcacgcggccatggcggg 

ata ttgt-t — ggg — gttacggcggctatgccaag 

-eg cctg~c--tcggcatcaccccggcgatgaagga 

aca ttgt-t — gca — atccatgctgccatgggggg 

cgtgacatcgt-g — gec — attaccgcggccatgccggg 

gga tctg-c — gcg— -gtgacggccgccatgccgga 

gca tegt-g — gcg — atcaccgccgctatgccctc 

cgc attgtc — ggg — attacggctgcgatggcgac 

caa gctg-atggcg — attactccggcgatgcgtga 

ccg actg-gttgcg — attacccccgccatgcgcga 

aaa ttat-a — ggt — atcacacctgcaatgcgtga 

ggaca — tegt-c — gcg — atcaccgccgcgatgctc — 
aaa tegt-a — ggc — gtaaccgcggcgatgcctag 



gggcacc gggctcg-acaagttccaggcgacg — t 

cggcacc ggcctgt-accggttcgagaagaag — t 

tggaaca ggtcttg-ataagcttatagaaaaa — t 

aggttcc gacctgg-tggcctt-cagcgaacg — t 

tgggacc ggaatga-accttttcca-tcgtcg — c 

ccccacc gggctga-ccgcgttcgggcagcgc — t 
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gggcagc ggcttgg-ttgagtttga-acaacga-t 
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cacccggtggggctcg-ccaggttc — gccgaccgct 

eggcaca ggattag-acaaactcattgacgct--t 

tccccg-atc-gcaccttcgacgtcgctatcgccgagcag 
tcccgg-acc-gcacctttgacgtgggcattgcggagcag 
atccaa-atc-gtttttgggatgtggctattgcagaacag 
tatccggaac-gctacttcgacgtcgccatcgccgaacag 
ttccca-acaaggtgttttgatgttggaatagcagaacaa 
tcccgg-atc-gattgttcgacgtcgggatcgccgagcaa 
ttccga-agc-gcaccttcgatgtgggcatcgcggaacag 
tcccga-acc-gcgtcttcgacgtgggcatcgccgagcag 
tgccga-agc-aatacatcgatgttggcattgccgaacag 
tcccgg-atc-gctacttcgacgtggcaattgccgagcaa 
tccccg-acc-gctatttcgatgtcggcatcgccgagcag 
tcccaa-aac-aatattttgacgtagcgattgcagaacag 
tcccgg-acc-gggtctgggacgtcggcatcgccgagcag 
accctt-tgc-gcttttttgatgtcgctatcgctgagcaa 
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12 98 cacgcggtcacct-tcgcagcgggccttgccgcgcagggg 

1369 cacgccgtgacct-ttgctgccggcctggcgtgcgagggc 

1084 catgcagtaactt-ctatggccgctatggcaaaagaagga 

1129 catgccgtgaccc-tggccgccggcatggcctgcgagggc 

1351 catgcagtaacct-ttgctgctggattggcttgtgaaggc 

10 99 cacgcgatgacgt-cggcggccgggttggcgatgggtggg 

1117 catgcggtgacct-tctcggcggcgcttgcggcaggcggc 

112 6 catgccgtgacct-tcgcggccggcctcgccggggccggg 

1114 cacgccgtggtgc-tagctgccggtatggcctgcgatggc 

1114 cacgcggtgacct-ttgctgcgggtctggcgattggtggg 

1147 cacgccgttacct-ttgccggcggtttggcttgcgaaggg 

1117 cacgctgtcacgt-ttgccacaggacttgcaattggcgga 

1105 cacgcggccgtgt-ccgcggccgggctcgccaccggcgga 

1084 cacgctttaacttctagcagc--gctatggctaaagaggg 

1337 atgcggccgttctgcgcg-atctactcgaccttcctgcag 
14 08 ctggtgcccttctgcacc-atctacagtaccttcatgcag 
1123 tttaaaccttttattgca-atatatagcacctttttgcag 
1168 atgaagccggtggtagcg-atctactcgaccttcctccag 
1390 attaaacctttctgtgca-atctattcgtctttcatgcag 
1138 ctgcaccccgtggtggcg-atctactcgacgttcctgaac 
1156 atgcggcccttctgcgcc-atctattccaccttcctccag 
1165 atgaagcccttctgcgcg-atctattcctcgttcctgcaa 
1153 atgcgtccggtggtggca-atctattccaccttcctgcag 
1153 tacaaacccattgtcgcg-atttactccactttcctgcaa 
118 6 atgaagcccgtcgtggcg-atttattccacctttttacaa 
1156 tataaacctgtcgtcgca-atttactcgacatttttacaa 
1144 ctgcacccggtcgtcgcc-gtctacgccaccttcctcaac 
1122 gtttaaaccttttgtgagcatctattctacttttttgcag 

137 6 cgcgcctacgaccaggtcgtccacgacgtcgcgatccaga 

1447 cgcggttacgaccagatcgtgcacgacgtgtccctgcaga 

1162 cgtgcttatgatcaagtgatccatgattgtgcgattatga 

1207 cgcgcctacgaccagttgatccatgacgtcgccgtgcagc 

1429 agggcttatgaccaggtagtgcatgacgttgatttgcaaa 

1177 cgggcgttcgaccagatcatgatggatgtggcgctgcaca 

1195 cgcggctacgaccagatcgtgcatgacgtggcaatccagc 

1204 cggggttacgaccagatcgcccatgacgtggcgctgcaga 

1192 cgggcctttgatcaagtcatccacgacgtttgtatccaaa 

1192 cgcgcctatgatcaggtgctgcatgacgtggcgattcaaa 

122 5 cgcgcctacgaccaactggtgcacgacatcgccctgcaaa 

1195 cgtgcttacgatcaattaattcacgatgttgccattcaaa 

1183 cgcgccttcgaccagctcctgatggacgtcgc cctgc 

1162 agggcttatgattctattgtgcatgacgcttgtatttcta 
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Figure 5 (page 18 of 25) 

1416 acc — tgccg-gtccgcttcgcgatcgaccgcgcgggcct 
14 87 age — tgcct-gtgcgcttcgctatggaccgcgctggcct 
1202 att — taaat-gtggtttttgctatggatagggcagggat 
12 47 acc — tcgac-gtgctgttcgccatcgaccgcgccggcct 
14 69 age — tgccc-gtgaggtttgcaatggacagagcaggtct 
1217 age — tgccg-gtcaccatggtgctggaccgtgccgggat 
1235 gcc--tgccg-gtgcgctttgccatcgaccgcgccggcct 
1244 acc — ttccc-gtccgcttcgtgatcgaccgggcggggct 
1232 age — tgccc-gtcttcttctgcctcgatcgcgcggggat 
1232 age — ttccg-gtcctgttcgccatcgaccgcgcgggcat 
12 65 acc — tgccc-gttttgtttgccgtcgaccgcgcgggcat 
1235 ate — tccct-gtgctatttgcaattgatcgagcagggat 
1220 accgctgcggtgtgaccttcgtcctggaccgggccggcgt 
12 02 gct--tgccg-attaaattagccattgacagggctgggat 

14 53 ggtcggtgccgacggcgcgacccatgccggcagcttcgac 
1524 ggtgggcgctgacggctccacgcactgcggcgccttcgac 
1239 agtaggcgaagatggggagacgcatcaaggtgtttttgat 
12 84 ggtcggcgaggacggcccgacccacgccggtagcttcgac 
150 6 tgttggagcagatggtccaacacattgtggtgcatttgat 
12 54 caccggtagcgacggcgccagccacaacggaatgtgggac 
1272 cgtgggggcggacggcgccacccatgcgggctcgttcgat 
1281 cgtgggggccgatggcgcgacccatgcgggggccttcgac 
12 69 agttggcgcggatggcccgactcaccaaggcatgtacgac 
12 69 tgttggtgctgacggtcaaacccatcagggtgcttttgat 
1302 cgtcggcgcggacggcccgacccatgccggtttgtacgat 
1272 agttggtgcagatggggctacacatcaaggtgcattcgat 
12 60 cacgggcgtcgacggcgcctcgcacaacggcatgtgggac 
1239 tgtgggcgaagatggcgagacgcaccaagggcttttagac 

14 93 gtgacctatctcgccagcctgcccaatttcgtggtgatgg 
1564 gtgacgttcatggcgtcgctgccgcacatgatcaccatgg 
127 9 cttagttttttagctcctttgccaaatttcactcttttag 
1324 atctcctacctgcgctgcatccccggcatgctggtgatga 
154 6 gttacttacatggcatgtcttcctaacatggttgtaatgg 
1294 ttgtcgatgctgggtatcgtgcccggcatccgggtggcag* 
1312 gtggccttcctgtcgaacctgcccggcatcgtggtgatgg 
1321 gttggcttcatcacttcgctgcccaacatgaccgtgatgg 
1309 attgcttacctgcggctgattcccaacatggtgctgatgg 
1309 ctctcttacctgcgctgcataccggaaatggtcattatga 

1342 ttaagctttttgcgctgcattccgaat atgattgtcg 

1312 attagctttatgcgttgcattccaaatatgatcattatga 

1300 atgtccgtcctccaggtcgtgccc ggcctcaggatcg 

127 9 gtgtcgtatttgcgctctatccctaa catggtcattt 
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Figure 5 (page 19 of 25) 

1533 cggccgcggacgaggtcgag-ctcgtccacatg — accca 

1604 ctccctcgaacgaggcggag-ctcatcaacatg--gtggc 

1319 c cccaagagat gaacaaatg — atgca 

13 64 cccccagcgacgaggacgag-ctgcgcaagctg — ctcac 
1586 ctccttctgatgaagcggag-ctatttcacatggtagcaa 

1334 cgcccagagacg cca-cccggttgcgtg — aagaa 

1352 ccgccgccgacgaggccgag-ctcgtccatatg — gtagc 

1361 ccgcggccgacgaggccgag-ctcatccacatg--atcgc 

1349 caccgaaagatgaggccgaa-c tgcagcgg — atgct 

134 9 ccccgagcgatgaaaacgaa-tgtcgccagatg — ctcta 

1379 ccgcgccgagcgatgaaaat-gaatgccgcctg—ctgct 

1352 cgccgagtgatgaaaatgaa-tgccgtcaaatg — ctcta 

1337 ccgccccgcgcgacgccgac-cacgtgcgcgcc — cagct 

1316 ttgccccacgagacaatgagactttaaaaaacg — ccgtg 

1570 ca-cg gcg--g--cga--tg--cacg acag 

1641 ca-cctgcgcc — g — cca — tc — gacg ac — 

134 4 aa-at ata — a — tgg — ag — tatgcttatttacat 

14 01 ca-c eg — g — ctacctg — ttcg a 

1625 ctget gec — g — cca — tt — gatg aca- 

13 66 ct-cg gcgagg — cgc — tc — gaegteg acga 

1389 ca-cc gec — g — ccg — cc — catg acga 

1398 ca-cc gec — g — tgg — cc — ttcg gega 

1383 ag-tg acg — g — gta — tt--gaat acga 

1386 ta-c cg--g — eta — t cact ataa 

1416 tt-cg acc — t — get — at--cagg caga 

138 9 ta-ca ggt — tatcaa — tg — tgga aaac 

137 4 gc-gg gag — g — egg — tc — gecg tgga 

1354 cg-tt ttg~-c — caa — tgaacacg attc 

1591 c — g gcccgatcgcgctgc-gctatccacgcggcaac 

1663 gcgccctcgtgcttccgcttcccccgcggcaac 

1372 caag gacctattgctttgc-gttatcctag ag 

1419 t--g gcccggccgcggtgc-gctatccgcgcggcagc 

164 6 gaccaagttgtttta-gatacccaagaggaaat 

1392 c--g gcccgacggcgttac-ggttccc caaa 

1410 a — g ggcccatcgccttcc-gctatccgcgcggcgac 

1419 g — g gccccatcgccttcc-gcttcccgcggggcgag 

1404 c — g gcccgatcgccatgc-gtttcccgcgcgggaat 

14 04 c — gatggcccgtcagcggtgc-gctacccgcgtggcaac 

14 37 c--g cgcccgccgccgtcc-gctatccgcgcggcacg 

1412 c — t gc ggcagtgc-gctaccctcgcggaaat 

1395 c — g acgcgccgacgctg atccgcttcccgaa 

1377 a — a gcccttgcgcgttcc-gatacc ctag 
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Figure 5 (page 20 of 25) 

STdxsdna 1625 ggcg tcggactggc-gctgccc 

CRdxsdna 1696 ggcc tgggcctgga-cctggccgcctacggcatcagc 

CJdxsdna 1403 ggag ttttattttg-gataaag 

PAdxsdna 1453 ggcc ccaaccatcc-gatcgat 

LEdxsdna 1678 ggga tcggtgtaga-gcttccg 

MTdxsdna 1420 ggtgatgtgggagaaga-tatttc 

RSdxsldna 1444 ggcg tgggggtcga-ggtgccg 

RSdxs2dna 1453 gggg tgggcgtcga-gatgccc 

SPCCdxsdna 1438 ggta ttggcgtacc-cctgccggaag 

ECdxsdna 14 41 gcgg tcggcgtgga-actg 

NMdxsdna 1471 ggta cgggcgtgcc-ggtttca 

HIdxsdna 14 41 gccg ttggtgtaaa-act t 

SSdxsdna 1425 ggag tccgtcggcccgcggatc 

HPdxsdna 1404 gggg teg — tttgc-gttaaaa 
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1646 
1732 
1424 
1474 
1699 
1443 
1465 
1474 
1463 
1459 
1492 
1459 
1447 
1423 



aa-gg- 
aa-gg* 
aa-tt- 
cc-gg 
gctgg 
— gg 
gt-ga 
ga-gc 
aa-gg 
ac-gc 
ga-cg 
ac-tc 
cc-gg 



--t — teeggag 
--a — cctgaag 
--taatccttgt 
--a 



--a- 
--c- 
--a- 

-g- 

--c- 
— c- 

— g- 

— c- 
— c- 



c- 

g- 

g- 

ectgeaa c- 

aacaaaggaattc- 

tttggag c- 

gggcgtg c- 

egggacg g- 



-ggctg 

-gtgtgcccct 

-agata 

-eggtg 

-ctctt 

-ggcgt 

-cgetc 

-tgctg 



tg-ggag 1 cgctc- 

gctggaa a aacta- 

catggaa a ccgtg- 

tttagaa a tgctt- 

cctcgac c gggtc- 

ga-gggggt--ttttgag ectageggtttt- 



STdxsdna 1664 

CRdxsdna 17 55 

CJdxsdna 14 44 

PAdxsdna 14 92 

LEdxsdna 1723 

MTdxsdna 14 59 

RSdxsldna 1483 

RSdxs2dna 1492 

SPCCdxsdna 1480 

ECdxsdna 1477 

NMdxsdna 1510 

HIdxsdna 1477 

SSdxsdna 1465 

HPdxsdna 1450 



-gaaatcggcaagggtc 
cgaggtgggcaagggtg 

-aaacttggtaagg 

-gagateggcaagg 

-gaggttggtaaaggta 
-ggaggcgtggatgtgc 
-cagatcggccgtggcc 
-gagcccggccggggcc 
-ccgattgggaaagcag 
-ccaattggcaaaggca 
-gaaateggcaagggea 
^cctattggtaaatcac 

-g gcggcctcgatg 

^gttttaggecaaag-c 



— gcgtggtccga gag 

— ttgtccgccgc cag 

cac aat 

— gcgtggtccgt eggege 

— ggatattgatt gag 

tggcggcgcccgcc gat 

-gggtggtgagc gag 

— gcgtggtgcgc gaa 

— agcaactgcgc caa 

— ttgtgaagcgt cgt 

ttatccgccgc gaa 

— gtttaattcga aaa 

— tgctgcaccgc ga 

— gaattgttgaaaaaagag 
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Figure 5 (page 21 of 25) 



STdxsdna 1694 

CRdxsdna 17 8 6 

CJdxsdna 14 63 

PAdxsdna 1522 

LEdxsdna 1753 

MTdxsdna 14 92 

RSdxsldna 1513 

RSdxs2dna 1522 

SPCCdxsdna 1510 

ECdxsdna 1507 

NMdxsdna 15 4 0 

HIdxsdna 1507 

SSdxsdna 1491 

HPdxsdna 1483 



ggcaagaaggtagcgatcctgtcgctcggcacg-cgcctt 
ggcaaggacgtgtgcctggtggcgtacggcagc-agtgtg 

ggcttgtaaaaaataatagtgaaatt g-cttttt 

ggcggcagggtcgcactgctggtcttcggcgtg-cagttg 
ggggagagagtggctctattgggatatggctc — agcagt 

ggtttgaaccacgacgtcctgttggtggccatc-ggc 

ggcacgcgaatcgcgctcctgtccttcggcacc-cgtctg 
gggacggatgtcgcgatcctctccttcggcgcg-catctg 
ggcgatgatttgctgatgttggcttacggctcg-atggtc 
ggcgagaaactggcgatccttaactttggtacg-ctgatg 
ggtgagaaaaccgcattcattgccttcggcagt-atggtc 
ggtcaaaaaattgcgattttaaattttggtact-ctatta 
— cgagcggcccgaggtgctgctggtcgccgtg-ggcgtc 
ggcgaaattttactcat — aggctatggtaatggcgtggg 



STdxsdna 
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RSdxsldna 

RSdxs2dna 
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1733 
1825 
1496 
1561 
1791 
1528 
1552 
1561 
1549 
1546 
1579 
1546 
•1528 
1521 



gcgg- 
aacg- 
taggtt 
gcgg 
gcag-- 
gcgt — 
gccg 
cacg — 
tatc-- 
ccag — 
gccc — 
ccat — 
atgg— 
gcgg-- 



aagca- 
aggcg- 
atgga- 
aggcg- 
aactg- 
tcgca- 
aggtg- 
aggcc- 
cggcc- 
aagcg- 
ctgca- 
ccgct- 



-ctaa-aggcc- 
-ctgg-ccgcg- 
-caag-gtgtg- 
-atga-aggtc- 
-tttggatgct- 



-gcc 
-gcg 
-gca 
-gcc 
-get 



-ccga-tggcgttggcggtggcc 

-cagg-tggee gcc 

-ttgc-aggcg gcg 

-ctgc-agacg gca 

-gcga-aagtc gcc 

-ttgg-cggtc gcc 

-ttag-agtta tea 



ca-caggtctgcctcc-agacc- 
gegea ttta g- 



-gec 
-tec 



STdxsdna 1754 

CRdxsdna 1846 
CJdxsdna ' 1519 

PAdxsdna 1582 

LEdxsdna 1813 

MTdxsdna 1558 

RSdxsldna 1573 

RSdxs2dna 1582 

SPCCdxsdna 157 0 

ECdxsdna 1567 

NMdxsdna 1600 

HIdxsdna 1567 

SSdxsdna 1555 

HPdxsdna 1538 



gacacgctcgaggcc — aagggcctctcgaccaccg 

gacatgctggagcgc — gatggcgtgtccaccaccg 

aaagcgtggcaagtcttaagagccttgcaagaaatgaata 

gaaagcctcgacg ecaegg 

attgtgctagaatcc — cgcggcttacaagtaacag 

aagcggctgcacaac — caggggatcggtgtgacgg 

gaggcgctggctgcg — cgcgggatctctcccacgg 

aaacttctcgaggcc — gagggggtgagcgtgaccg 

gaactgetgaatgag — cacggcatctcagctactg 

gaatcgctgaacg ccacgc 

ggaaaactgaacg ccaccg 

gaaaaactcaatg caaegg 

gagctgctccgggcc — cgcggcatcggatgcacgg 

aactggctttaaaag — aaaaaaacatagaatgege 



SUBSTITUTE SHEET (RULE 26) 



WO 02/26933 



PCT/US01/30328 



27/97 

Figure 5 (page 22 of 25) 



STdxsdna 17 88 tcgcc — gacctgcgcttcgccaaaccg 

CRdxsdna 1880 tcatt-- gacgcgcgcttctgcaagcct 

CJdxsdna 1559 ataatgctaatttgatt—gatttaatttttgctaaacct 

PAdxsdna 1601 tcgtc--gacatgcgtttcgtcaaaccc 

LEdxsdna 18 47 ttgca — gatgcacgtttctgcaaacca 

MTdxsdna 1592 tgatc--gacccgcgctgggtgttgccg 

RSdxsldna 1607 ttgcg — gatgcgcgctttgcaaagccg 

RSdxs2dna 1616 tggcc — gacgcccgcttctcgcgcccg 

SPCCdxsdna 1604 tgatc — aatgcccgcttcgccaagccc 

ECdxsdna 1586 tggtc — gatatgcgttttgtgaaaccg 

NMdxsdna 1619 tcgcc--gatatgcgcttcgtcaaaccg 

HIdxsdna 1586 ttgtc-- gatatgcgt tttgtgaaaccg 

SSdxsdna 1589 tcgtc — gacccgcgctgggtcaagccc 

HPdxsdna 1572 tctcttggatctcaggtttttaaagcct 



STdxsdna 1814 ctcgacgaggatctgatcc-gc- 

CRdxsdna 1906 ctggacaccaagctgatcc-gct 

CJdxsdna 1597 ttagatgaagagcttttgt-gt- 

PAdxsdna 1627 ctcgacgaagccctggtac-gc- 

LEdxsdna 1873 ctggaccatgccctcataa-gg- 

MTdxsdna 1618 gtgtctgacggtgtg c-gc- 

RSdxsldna 1633 ctcgaccgggatctgat c- 

RSdxs2dna 1642 ctcgacacggggcacatcg-ac- 

SPCCdxsdna 1630 ttagatgaggaactgattgtgc- 

ECdxsdna 1612 cttgatgaagcgttaattc-tg 

NMdxsdna 1645 atagacgaagagttgattg-tc- 

HIdxsdna 1612 attgatattgaaatgatta-at- 

SSdxsdna 1615 gtcgaccccgtgctg 

HPdxsdna 1600 ttagatccaaatttaagcg-cg- 



c-gcctgctcaccaccc 
c-ggctgc-caaggagc 
gagcttgctaaaaaaag 
g-aattggcgggcagcc 
a-gccttgcaaaatcac 
g-aactggcggtgcagc 
c-tgcagctcgcggccc 
c-agctcgtgcgccatc 
c-gctggcgcgccagat 
g-aaatggccgccagcc 
c-gccttgcccgaagcc 
gtgcttgcacaa-actc 
c-ccccactcgccgccg 
a-tcgttgccccttatc 



STdxsdna 

CRdxsdna 

CJdxsdna 

PAdxsdna 

LEdxsdna 

MTdxsdna 

RSdxsldna 

RSdxs2dna 

SPCCdxsdna 

ECdxsdna 
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HIdxsdna 

SSdxsdna 

HPdxsdna 



18 51 acgaagtggcggtga 

1943 accctgtcatgatca 

1635 taaaatttggtttat 

1664 acgaactgctggtga 

1910 atgaagtgctaatca 

1652 acaagctgctcgtca 

1667 atcacgaggcgctca 
1679 acgcggcgctggtaa 

1668 cggcaaagtcg-tca 
1649 atgaagcgctggtca 
1682 acgaccgcatcgtta 
1649 acgattatttggtca 
1646 agcaccggctcgtcg 
1637 aaaagctctatgttt 



— cgatcgaggaa 
— ccatcgaggag 

ttttagtgaaa 

ccatcgaggaa 

ctgtcgaagaa 

cgctagaggac 

ttaccatcgaggag 
— cggtggagcag 

cctttgaggaa 

ccgtagaagaa 

— cccttgaagaa 

cattggaagaa 

ccgtcgtggag 

ttagcgataat 



— ggcgc g 

— ggctc c 

atgttaa a 

— aacgccgtg 

—ggatc a 

--aacgg g 

— ggcgc c 

--ggggc— c 

--ggctg c 

— aacgc c 

--aacgcc — g 

--aatgc a 

--gac 

— tacaa g 
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1883 
1975 
1669 
1699 
1942 
1684 
1702 
1711 
1699 
1681 
1715 
1681 
1675 
1669 

1913 
2005 
1700 
1730 
1972 
1711 
1732 
1744 
1729 
1698 
1747 
1709 
1705 
1682 

1941 
2033 
1717 
1749 
2000 
1742 
1760 
1769 
1753 
1724 
1775 
1735 
1733 
1706 



ate — ggcggccccggt-gcgc-atgtgctgacg 

gtg--ggtggcttcgct-gcgc-acgtgatgcag 

att — ggcggtatagaaagttt-aattaataatt 

atg — ggcggcgccggc-tcg gcggtcggcgagt 

att — ggaggttttgga-tctc-atgttgttcag 

gtc--aacggtggggcg-gggt-cagcggtg 

atc--ggcggcttcggc-agcc-atgtggcgcag 

atg — ggcggcttcggc-gcct-atgtcatgcactgt 

eta — cccggcggcttt-ggct-ccgcgattatg 

att--atgggc g-gegc-agg 

aacagggcggcgcaggc-agcg-cggtgctggaa 

att — caagg tgga-gcgggatctgctgttg 

aac — agccgggccgcc-gggg-tcggttcggcg 



ctt — ggagg ggt-g — 



g 



— etc gccagcgatac-cggcc — t gatcgacg 

ttc ctcgcactgga-gggcc— t gctggacg 
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cgaagtc 

gcaagttg 
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gatg 1 egggtt 

eggt atcgctcgatggt 

gege tccgggtcatgac 

cegg tgttgccgatcgg 

aaaccagtacccgtgctgaacattgg 

ttgc 1 tttggg 

a-ct tttacaacttg-g 

cegg tgcgccgcttcgg 

gcgaac 
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cctgccggaca tattccaggaccaggacaagcccga 

gctgccggacc gctacatcgaccacggcgactaccg 

tatga-agaca aatttattgaacatggaaa 

cctgcccgact actacgtcgaacacgccaagcccag 

tcttcctgatc gatacattgaccatggatctcctgt 

gccgcaggagt tctacgagcacgcgtctcgaagcga 

gctgcccgaca cgttcatcgaccacaacagcgccga 

gctgcccgacc gcttcatcgagcaggcgagccccga 

tgttcccgatc tcttggtggaacatgccagccctga 

cctgccggact tctt tattccgc 

cgttgccgata ccgtaaccggacacggcgatccgaa 

cttgccagattattttattccacaagcgacaca gcaa 

catccccgagc agttcctcgcgcacgccaggcgcgg 

aaaa tattttaaagcctgttaaaagcttt 
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-aacaagt gag g 
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-gtgctg gccgat c 
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-gacatgt-a tgccgat g 



tgaatctaaacagg-agttgggcctgacg — 

a aggaactca ggaagaa- 

a aaacttt-t agacgat- 
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t gaggtgc-t cgccgac- 



-c 
-a 
-t 
-t 
-a 
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ccgg-gctgaatgcggcc- 
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-caggacgt-- 
-gacatagag- 
-gatatcgcg- 
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c- 
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-gagatcgcc- 



catg-g — gaacaccgctttagtggaaaaatccttaggat 
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— gccgctggta tggaag 
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— g-ggeggate gg — eg 

tagacacagagagtttgactg 
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aaag 
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ag gectgg 

c gcgtgg 



gg gaggaa- 

tttag gacaag- 
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ctgggtac-cgg gg-t 
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ccggcaga-cggcaaagc-c 
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1845 1— taa 
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1935 gcgggcg — gtctga 

1910 ga 

1857 ggca — taa 

1903 gcggcaaattaa 

1870 a-attta — taa 

1887 gcccgca — tga 
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617 — iavigdgamsagmayeamnna-eaagnr-lvvilndnd 
145 — iavigdgaltagmayealnnaghirpdr-f ivilndne 
139 — ipiigdgaltggmalealnhi-gdekkd-mivilndne 
218 — iavigdgaitggmayeamnha-gf ldkn-mivilndnq 
137 — valigdgalsagmayealnel-gdskfp-cvillndne 
154 — vavigdgaltagmafealnha-sevdad-mlvilndnd 
212 — iavigdgamtagqayeamnna-gyldsd-mivilndnr 
139 — vavvgdgaltggmcwealnni-aatprp-vvivvndng 
139 — va vvgdgal tggmcwealnni-aas r rp-vii wndng 
147 — iaiigdgsitagmayealnha-ghlksr-mfvilndnd 
14 6 — vavigdgsmsagmaf ealnhg-ghlknr-vivilndne 
149 — iavigdgsitagmayealnha-ghlnkr-lfvilndnd 
139 — vavigdgsltggmaleainhaghlpktr-llvvlndnd 
139 — vsiigdgaltggmaleainhaghlphtr-lmvilndne 
133 — vvvigdgaltsgmalealnql-knlnsk-mkiilndng 
147 — vcvigdgaitagmafeamnha-gdirpd-mlvilndne 
143 — vaiigdgamtagqafealnca-gdmdvd-llvvlndne 
147 — vcvigdgaitagmaf ealnha-galhtd-mlvilndne 
621 vhiaiigdggltggmalealnyi-sflnsk-iliiyndng 
139 — vaviggraltggmawealnni-aaakdqpliivvndne 
137 — iallgdgsisagifyealnel-gdrkyp-mimilndne 

725 msiap pvgglsayl — arlissseyl — gl 

182 msisp nvgaistyl — nriisghfvq — et 

175 msiap nvgaihsml — grlrtagkyq — wv 

254 qvslptqynnknqd-pvgalssal — arlqanrplr — el 

17 3 msisk pigaiskyl — sqamatqfyq — sf 

190 msish nvgglsnyl — akilssrtys — sm 

248 qvslptatldgpva-pvgalssal — srlqsnrplr — el 

175 rsyap tiggvadhl — atlrlqpaye — rl 

175 rsyap tiggvadhl — atlrlqpay 

183 msiap pvgalqhyl — ntiarqapfa — al 

182 msiap pvgalssyl — srlyagapfq — df 

185 msiap pvgalaryl — vnlsskapfa — tl 

17 6 msisp nvgalsryl — nk-irvsepm — ql 

17 6 msisp nvgaisrylnkvrlsspmqf ltdnl 

169 msisp nvgglayhl — sklrtspiyl — kg 

183 msise nvgalnnhl — aqllsgklys — si 

179 msisp nvgalpkyl— asnwrdmh gl 

183 msise nvgalnnhl — arifsgslys — tl 

659 qvslptnavsisgnrpigsisdhl — hyfvsnie 

17 6 rsyap tigglanhl — atlrttdgye — kv 

173 msist pigalskal — sqlmkgpfyq — sf 
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803 relakrf trk — lsr rltaa a-gkaeef 

208 rqkiknf lqh--fge tplri m-klteef 

201 kdeleyl fkk — ipavgg klaat a-ervkds 

28 9 reiakgv tkq — lpd vvqka t-akidey 

199 kkriakm ldi — lpd satym a-krfees 

216 regsk k — vis rlpgaweia-rrteey 

28 3 revakgv tkq — igg pmhel a-akvdey 

201 lekg rd — alh slpli g-qiayrf 

198 -eqalet grd — lvr avplv g-glwfrf 

209 kaaaegi emh — lpg pvrdg a-rrarqm 

208 kaaakga lgl — lpe pfqeg a-rrakem 

211 raaadgl eas — lpg plrdg a-rrarql 

201 — ltdgl tqg — mqqipfvggaitqg f-epvkeg 

206 eeqikhl pf vgd sltpe m-ervkeg 

195 kkvlkkv lekteigf eveee m-kylrds 

209 reggkkv fsg — vp pikel 1-krteeh 

204 lstvkaq tgk — vld kipgamef a-qkvehk 

209 rdgskki ldk — vp piknf m-kkteeh 

691 anag dnk — Isk n 

202 lawgkdvllrtpi — vgh plyea Ihgakkgf 

199 rskvkki 1st — lpe svnyl a-srfees 

878 argm — atg g tlf eelgfyyvgpidg 

233 lkgl — isp g vif eelgfnyigpidg 

229 lkym — lvs g mff eelgf tylgpvdg 

314 argmisgtg s tlf eelglyyigpvdg 

224 fk-l~itp g llf eelgleyigpidg 

240 akgm — Ivp g tlf eelgwnyigpidg 

308 argmisgsg s tlf eelglyyigpvdg 

222 mhsv — kagikdslspq llf tdlglkyvgpvdg 

222 lhsv — kagikdslspq llf tdlglkyvgpvdg 

234 vtam — pgg a tlf eelgf dyigpvdg 

233 lksv — tvg g tlf eelgf syvgpidg 

236 vtgm — pgg g tlf eelgf tyvgpidg 

230 mkrl — syski g avf eelgf tymgpvdg 

230 mkrl — vvpkv g avieelgf kyf gpidg 

222 lkgm— iqg 1 nf feslglkyf gpfdg 

233 ikgm — vvp g tlf eelgf nyigpvdg 

232 iktl — aee aehakqslslf enf gf rytgpvdg 

233 mkgvmf spe s tlf eelgf nyigpvdg 

702 ake n nif enlnydyigvvng 

231 kdaf — apq — ■ g mf edlglkyvgpidg 

224 fk-1 — itp g vf f eelginyigping 
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STdxsp 950 hnlehlipvlenvrdse-q-gpilihvvtkkgkgyapaea 

AAdxsp 257 hdikaledtlnnvkdi — k-gpvllhvytkkgkgykpaee 

BSdxsp 2 53 hsyhelienlqyakkt — k-gpvllhvitkkgkgykpaet 

CRdxsp 340 hnlddliavlsevrsae-tvgpvlvhvvtekgrgylpaet 

C Jdxsp 2 47 hnlgeiisalkqak-am-q-kpcvihaqtikgkgyalaeg 

PAdxsp 2 64 hdlptlvatlrnmrdm — k-gpqf lhvvtkkgkgf apael 

LEdxsp 334 hniddliailkevrstk-ttgpvlihvvtekgrgypyaer 

MLdxsp 253 hd-ehavevalrkargf-g-gpvivhvvtrkgmgyppaea 

MTdxsp 2 53 hd-eravevalrsarrf -g-apvivhvvtrkgmgyppaea 

RCdxsp 258 hdmaelvetlrvtrara-s-gpvlihvcttkgkgyapaeg 

RSdxslp 257 hdldqllpvlrtvkqra-h-apvlihvitkkgrgyapaea 

RSdxs2p 2 60 hdmeallqtlraarart-t-gpvlihvvtkkgkgyapaen 

SPCCdxsp 256 hnleeliatfreah-kh-t-gpvlvhvattkgkgypyaee 

SPdxsp 256 hslqelidtf kqa-ekv-p-gpvfvhvsttkgkgydlaek 

TMdxsp 246 hniellekvf krirdyd-y-ssv-vhvvtkkgkgf taaee 

ECdxsp 257 hdvlglittlknmrdl — k-gpqf lhimtkkgrgyepaek 

NMdxsp 2 63 hnvenlvdvledlr-gr-k-gpqllhvitkkgngyklaen 

HIdxsp 259 hnidelvatltnmrnl — k-gpqf lhiktkkgkgyapaek 

PFdxsp 7 22 nnteelf kvlnnikenklk-ratvlhvrtkksndf insks 

SSdxsp 254 hdigavesalrrak-rf-h-gpvlvhcltvkgrgyepala 

HPdxsp 247 hdlsaiietlklakelk-e — pvlihaqtlkgkgykiaeg 

STdxsp 1064 -aadkyhgvqk fd — vitg-aqaka pp 

AAdxs p 294 -npvkwhgvap y k- - ve sg-eiik ks 

BSdxsp 290 dtigtwhgtgp yk — intg-dfvkp ka 

CRdxsp 379 -aqdkmhgvvk fd — prtg-kqvqa kt 

C Jdxsp 2 84 -khakwhgvga f d — idsg-esvkk sd 

PAdxsp 301 -dpigyhaitk le — apgs-apkkt 

LEdxsp 373 -aadkyhgvak fd — patg-kqfka sa 

MLdxsp 2 90 dqaeqmhtcgv md — pttg-qptki 

MTdxsp 290 dqaeqmhstvp id — patg-qatkv 

RCdxsp 296 -aedklhgvsk fd — ietg-kqkks ip 

RSdxslp 2 95 -ardrghatnk fn — vltg-aqvkp vs 

RSdxs2p 2 98 -apdkyhgvnk fd — pvtg-eqkks va 

SPCCdxsp 2 93 -dqvgyhaqnp fd — latgkakpas kp 

SPdxsp 293 -dqvgyhaqsp fn — lstgkaypss kp 

TMdxsp 283 -nptkyh sas ps 

ECdxsp 294 -dpitfhavpk f d — pssg-clpks sg 

NMdxsp 300 -dpvkyhavan lp — kesa-aqmpsekepkpa 

HIdxsp 296 -dpigf hgvpk fd — pisg-elpk nn 

PFdxsp 7 61 -pisilhsikkneifpfdttilng-nihke nkiee 

SSdxsp 291 heedhf htvgv md — pit — cepls pt 

HPdxsp 284 -ryekwhgvgp fd — ldtg-lskks ks 
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STdxsp 12 68 rtf dvaiaeqhavtf aaglaa-qgmrpf caiystf lqray 

AAdxsp 361 rf fdvgiaeqhactfaaglaa-eglrpvaayystf lqray 

BSdxsp 359 rmfdvgiaeqhaatmaaamam-qgmkpf laiystf lqray 

CRdxsp 4 47 rtfdvgiaeqhavtfaaglac-eglvpfctiystfmqrgy 

CJdxsp 352 rfwdvaiaeqhavtsmaamak-egf kpf iaiystf lqray 

PAdxsp 3 67 ryf dvaiaeqhavtlaagmac-egmkpvvaiystf lqray 

LEdxsp 4 41 rcfdvgiaeqhavtfaaglac-egikpfcaiyssfmqray 

MLdxsp 357 rlf dvgiaeqhamtsaaglam-grmhpvvaiystf Inraf 

MTdxsp 357 rlf dvgiaeqhamtsaaglam-gglhpvvaiystf lnraf 

RCdxsp 364 rvfdvgiaeqhavtfaagmaa-aglkpf lalyssfvqrgy 

RSdxslp 363 rtfdvgiaeqhavtfsaalaa-ggmrpfcaiystflqrgy 

RSdxs2p 366 rvfdvgiaeqhavtf aaglag-agmkpf caiyssf lqrgy 

SPCCdxsp 3 62 qyidvgiaeqhavvlaagmac-dgmrpvvaiystf lqraf 

SPdxsp 3 62 qyvdvgiaeqhavtlaagmac-egirpvvaiystf lqrgy 

TMdxsp 342 rf f dlgiteqtcvtf gaalgl-hgmkpvvaiystf lqray 

ECdxsp 3 62 ryf dvaiaeqhavtf aaglai-ggykpivaiystf lqray 

NMdxsp 37 3 ryf dvgiaeqhavtf agglac-egmkpvvaiystf lqray 

HIdxsp 3 63 qyfdvaiaeqhavtfatglai-ggykpvvaiystf lqray 

PFdxsp 871 nvydvgiaeqhsvtf aaamamnkklkiqlciystf lqray 

SSdxsp 359 rvwdvgiaeqhaavsaaglat-gglhpwavyatf lnraf 

HPdxsp 352 rf fdvaiaeqhaltsssamak-egf kpfvsiystf lqray 

STdxsp 1385 dqvvhdvaiqnlpvrfaidraglvgadgathagsfdvtyl 

AAdxsp 400 dqvihdvalqnlpvtfaidraglvgddgpthhgvfdlsyl 

BSdxsp 398 dqvvhdicrqnanvf igidraglvgadgethqgvf diafm 

CRdxsp 48 6 dqivhdvslqklpvrfamdraglvgadgsthcgafdvtfm 

CJdxsp 391 dqvihdcaimnlnvvfamdragivgedgethqgvfdlsf 1 

PAdxsp 406 dqlihdvavqhldvlfaidraglvgedgpthagsfdisyl 

LEdxsp 480 dqvvhdvdlqklpvrf amdraglvgadgpthcgaf dvtym 

MLdxsp 396 dqimmdvalhklpvtmvidragitgsdgpshngmwdlsml 

MTdxsp 396 dqimmdvalhklpvtmvldragitgsdgashngmwdlsml 

RCdxsp 403 dqlvhdvalqnlpvrlmidraglvgqdgathagafdvsml 

RSdxslp 402 dqivhdvaiqrlpvrf aidraglvgadgathagsf dvaf 1 

RSdxs2p 405 dqiahdvalqnlpvrf vidraglvgadgathagaf dvgf i 

SPCCdxsp 401 dqvihdvciqklpvf f cldragivgadgpthqgmydiayl 

SPdxsp 401 dqiihdvciqklpvff cldragivgadgpthqgmydiayl 

TMdxsp 381 dqiihdvalqnapvlfaidrsgvvgedgpthhglfdinyl 

ECdxsp 401 dqvlhdvaiqklpvlfaidragivgadgqthqgafdlsyl 

NMdxsp 412 dqlvhdialqnlpvlfavdragivgadgpthaglydlsfl 

HIdxsp 402 dqlihdvaiqnlpvlfaidragivgadgathqgafdisfm 

PFdxsp 911 dqiihdlnlqniplkviigrsglvgedgathqgiydlsyl 

SSdxsp 398 dqllmdvalhrcgvtfvldragvtgvdgashngmwdmsvl 

HPdxsp 391 dsivhdacisslpiklaidragivgedgethqglldvsyl 
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STdxsp 1667 igkg-r-vvr 

AAdxsp 4 92 igtw-e-ell 

BSdxsp 4 92 igtw-e-vlr 

CRdxsp 587 vgkg-v-vrr 

CJdxsp 4 83 lgka-qwlvk 

PAdxsp 4 99 igkg-v-vrr 

LEdxsp 57 6 vgkg-r-ili 

MLdxsp 489 vdvl-a-vpa 

MTdxsp 489 vdvl-a-apa 

RCdxsp 4 97 igkg-r-vmt 

RSdxslp 496 igrg-r-vvs 

RSdxs2p 499 pgrg-r-vvr 

SPCCdxsp 495 igka-e-qlr 

SPdxsp 496 igka-e-ilr 

TMdxsp 475 idlgwk-ilk 

ECdxsp 494 igkg-i-vkr 

NMdxsp 505 igkg-i-irr 

HIdxsp 494 igks-r-lir 

PFdxsp 1020 ddvdkyseeymdddnf iksf igks-r-iikmdnennntne 

SSdxsp 488 vggl-d-vlhrd 

HPdxsp 485 Igqs-e-llk 

STdxsp 1691 eg — kk — vailslgtrlaealkaadtlea 

AAdxsp 500 eg--ed — cvilavgypvyqalraaeklyk 

BSdxsp 500 pg — nd — aviltf gttiemaieaaeelqk 

CRdxsp 595 qg--kd — vclvaygssvnealaaadmler 

CJdxsp 492 nn — se — iaf lgygqgvakawqvlralqe 

PAdxsp 507 rg — gr — vallvf gvqlaeamkvaeslda 

LEdxsp 584 eg — er — vallgygsavqncldaaivles 

MLdxsp 497 tglaqd — vllvgvgvf asmalavakrlhn 

MTdxsp 497 dg — lnhdvllvaigaf apmalavakrlhn 

RCdxsp 505 eg — te — vailsf gahlaqalkaaemlea 

RSdxslp 504 eg — tr — iallsf gtrlaevqvaaealaa 

RSdxs2p 507 eg — td — vailsf gahlhealqaakllea 

SPCCdxsp 503 qg — dd — llmlaygsmvypalqtaellne 

SPdxsp 504 sg — dd — vlllgygsmvypalqtaellhe 

TMdxsp 484 rg — re — aaiiatgtilnevlkip 

ECdxsp 502 rg — ek — lailnf gtlmpeaakvaeslna 

NMdxsp 513 eg — ek — taf iaf gsmvapalavagklna 

HIdxsp 502 kg — qk — iailnf gtllpsalelseklna 

PFdxsp 1058 hyssrgdtqtkk — kk — vcif nmgsmlf nvinaikeiek 

SSdxsp 498 er — pe — vllvavgvmaqvclqtaellra 

HPdxsp 493 ke — ge — illigygngvgrahlvqlalke 
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HPdxsp 

STdxsp 

AAdxsp 

BSdxsp 

CRdxsp 

CJdxsp 

PAdxsp 

LEdxsp 

MLdxsp 

MTdxsp 

RCdxsp 

RSdxslp 

RSdxs2p 

SPCCdxsp 

SPdxsp 

TMdxsp 

ECdxsp 

NMdxsp 

HIdxsp 

PFdxsp 

SSdxsp 

HPdxsp 



Figure 6 (page 17 of 18) 

17 69 k gls ttvadlrf akpldedlirrll — tthevavt 

52 6 e girvgvvnarfvkpmdekmlrdla--nrydtf it 

52 6 e glsvrvvnarf ikpidekmmksil--keglpilt 

621 d gvsttvidarfckpldtklirsaa — kehpvmit 

518 m nnnanlidlifakpldeellcela — kkskiwf i 

533 tvvdmrfvkpldealvrela — gshellvt 

610 r glqvtvadarf ckpldhalirsla—kshevlit 

525 q gigvtvidprwvlpvcdgvl-ela--hthklivt 

525 q gigvtvidprwvlpv-sdgvrela — vqhkllvt 

531 e gvsttvadarf crpldtdlidrli--eghaalit 

530 r gisptvadarf akpldrdlilqla — ahhealit 

533 e gvsvtvadarf srpldtghidqlv — rhhaalvt 

52 9 h gisatvinarfakpldeelivpla — rqigkvvt 

530 h gieatvvnarfvkpldtelilpla — erigkvvt 

505 ldvtvvnaltvkpldtavlkeia — rdhdliit 

52 8 tlvdmrfvkpldealilema — ashealvt 

539 tvadmrfvkpideelivrla — rshdrivt 

528 tvvdmrfvkpidieminvla — qthdylvt 

1094 eqyishnysf sivdmif lnpldknmidhvikqnkhqylit 
524 r gigctvvdprwvkpv-dpvlppla — aehrlvav 

519 k niecalldlrf lkpldpnlsaiva — pyqklyvf 



1868 
559 
559 
654 
551 
561 
643 
557 
557 
564 
563 
566 
562 
563 
536 
556 
567 
556 

1134 
556 
552 



ieega-i-ggpgahv-- 
vednt-vvggfgsgv-- 
ieeav-leggf gssi-- 
ieegs-v-ggf aahv-- 
f senvki-ggiesli-- 
ieena-vmggagsav-' 
veegs-i-ggf gshv- 
ledng-vnggvgaav- 
ledng-v-nggagsa- 
leqga-m-ggf gamv- 
ieega-i-ggf gshv- 
veqga-m-ggf gayv- 

feegc-1 pggfg- 

meegc-lmggf gsav- 
veeamki-ggf gsfv- 



--ltlasdtglida-glklrtmr 

--lef faregimk rvinlg 

— lef ahdqg — ey-htpidrmg 
--mqflaleglldg-glkf rpmt 

--nnf lqk ydl-hvkvvsf e 

--gef lasegl evpllqlg 

— vqfmaldglldg-klkwrpiv 

— stalrq vei-dtpcrdvg 

— vsaalrraeid vpcrdvg 

— lhylartgqlek-grairtmt 
--aqllaeagvfdr-gf ryrsmv 
— mhclansggfdg-glalrvmt 
—saimeslqahdl-qvpvlpig 
— aealmdnnvl vplkrlg 



-aqrlqemgwqg kivnlg 

veena-imggagsgvnevlmahrkpvpvlni-g 

leena-eqggagsav levlakhgickp-vlll g 

leena-iqggagsav aevlnssgksta-llql g 

yednt-i-ggf sthf nnyliennyitkhnlyvhniy 

vednsra-agvgsav alalgda dv-dvpvrrfg 

sdnyk-l-ggvasai lef lseqnilk pvksf e 
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Figure 6 (page 18 of 18) 

STcixsdna 19 67 lpdif qdqdkpekqydeaglnaanivdtvl-k-al-ryne 

AAdxsp 590 vpdrfiehgkqdilrnlvgidaegiekavr-d-al-kggr 

BSdxsp 591 ipdrfiehgsvtalleeigltkqqvanrir-l-lm p 

CRdxsp 687 lpdryidhgdyrdqlamagltsqhiastal-t-tlgrakd 

CJdxsp 582 yedkfiehgkts eveknlekdvnslltk-vl-kf yh 

PAdxsp 5 92 lpdyyvehakpsemlaecgldaagiekavr-q-rl-drq- 

LEdxsp 67 6 lpdryidhgspvdqlaeagltpshiaatvf-n-il-gqtr 

MLdxsp 588 Ipqef ydhasrsevladlgltdqdvarrit-gwvv-af gh 

MTdxsp 588 lpqef yehasrsevladlgltdqdvarrit-g-wv 

RCdxsp 597 lpdcyidhgspeemyawagltandirdtal-a-aa-rpsk 

RSdxslp 596 lpdtfidhnsaevmyataglnaadierkal-e-tl gv 

RSdxs2p 599 lpdrfieqaspedmyadaglraediaatar-g-al-argr 

SPCCdxsp 593 vpdllvehaspdeskqelgltprqmadril-e kfgs 

SPdxsp 594 vpdilvdhatpeqstvdlgltpaqmaqnim-a-sl-f kte 

TMdxsp 567 vedlfvphggrkellsmlgldsegltktv 1-tyik 

ECdxsp 587 lpdf f ipqgtqeemraelgldaagmeaki k 

NMdxsp 598 vadtvtghgdpkkllddlglsaeaverrvr-a-wl sd 

HIdxsp 587 lpdyfipqatqqealadlgldtkgieekil-n-fi-a-kq 

PFdxsp 1168 Isnepiehasf kdqqevvkmdkcslvnrik-n-yl-knnp 

SSdxsp 587 ipeqflaharrgevladigltpveiagrig-a-sl-pvre 

HPdxsp 5 82 iidefimhgntalvekslgldtesltdail-k-dl-gqer 

STdxsdna 2078 a e— 1— ad gvra* 

AAdxsp 627 1 i 

BSdxsp 625 p k— t--hk gigs 

CRdxsp 725 a a — kfsls alqa 

CJdxsp 616 

PAdxsp 628 

LEdxsp 713 e a — 1 — ev mt 

MLdxsp 62 6 c g — s — gddagqygprssqtm 

MTdxsp 621 a a — 1 — gt gvcasdaipehld 

RCdxsp 634 sv r — i — vh sa 

RSdxslp ' 631 e v — 1 — ar ra 

RSdxs2p 636 vmplrq — t — ak prav 

SPCCdxsp 628 r q — r — ig aasa 

SPdxsp 631 t esvv — ap gvs 

TMdxsp 601 a r — s — re gkv 

ECdxsp 617 a w — 1 — a 

NMdxsp 633 r d — a — an 

HIdxsp 623 g n — 1 

PFdxsp 1205 t 

SSdxsp 624 e — p — ae eqpa 

HPdxsp 619 
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Figure 7 

1 cgacggcccg gtagccccgg cgcggctgca gcaccgtcag acgtccgccg 
51 agaaagccgt cggaagtcaa ttcgtccggg gcgaacatca gggggtcgtc 
101 gggatgccgt tgtcggacat cacccggcag gcgcgatccc agtcttcttc 
151 cgggacaaac agacgccgcg gcaatatgcc gatggagcct tcgaggacgc 
201 tcatgtggac gtccaccgga aaggcgtcta tatcctcgcc ctgaaggagc 
251 gcggtggcga aggcgatgat cgtcgggtcg gtcgtgcgca acagttcctt 
301 catgtcgggg acattgtcgg caacgcctcg gtttgtcgag gccggttcgt 
351 cgaccgggtg gcaggatcgg gatgggattg gacgaggttt cgcaaaagcc 
401 gcatgaacgg ctcgccgcgt ggctggccga ggacatggcc gccgtcaacg 
451 ggctgatccg cgagcggatg gcctcgaaac acgcgccccg cattcccgag 
501 gtcacggcgc atctggtcga ggccggcggc aagcggctgc ggccgctcct 
551 gacgctcgcc gcggcgcggc tgtgcggcta cgaggggccc tatcacatcc 
601 atctggccgc gacggtggag ttcatccaca cggcgacgct gcttcacgac 
651 gatgtggtgg acgaaagcca ccgccgccgc ggcaaaccca cggcgaacct 

7 01 gctgtgggac aacaaatcct cggtgctggt gggcgactat ctcttcgccc 
751 gcagcttcca gctgatggtc gagaccggct cgcttcgcgt gatggacatc 
801 ctcgccaatg cctcggccac catctccgag ggcgaggtgc tgcagctgac 

8 51 cgcggcccag gatctgcgca cgaccgagga catccacctg caggtggtgc 
901 gcggcaagac ggccgcgctc tttgccgcgg caaccgaggt gggcggcgtg 
951 gtcgcgggcg tgcccgaggc gcaggtcgag gcgctccacg cctacgggga 

1001 cgcgctgggg atcgccttcc agatcgtcga cgacctcctc gattatggcg 
1051 gcgtggatgc ccagatcggc aagaacaccg gcgacgactt ccgcgaacgc 
1101 aagctgacgc tgccggtcat caaggcggtg gcccaggccg atgccgagga 
1151 gcgcgccttc tggcagcggg tgatcgagaa gggcgaccag cgcgagggtg 
1201 acctcgagca agcccatgcg atcatgtccc gccacggcgc catggaggcc 
1251 gcccggcagg atgcgctccg ctgggtcacg gtggcgcgcg aggcactcgg 
1301 ccagctgccg gagcacccgc tgcgcgagat gctgcacgat ctggccgatt 
1351 tcgtggtcga acgcatcgcc tgatcccttc cgggcgctct gccccggcgc 
1401 agcgcaggat cccgcgctgc gcccctttcg gccttccgac agtccctctg 
1451 ccgcgggagg ccggcctcgc ctgagaagcc gcactggccg ccggtcttcc 
1501 cccgaaccgc tcccgggcct gctcggaagg cgtccgccgc aaaagccccc 
1551 gcgggggggc cccaccggcg gccatcagga agagaccgtt gaagcggccc 
1601 gctcgaatcc tgtcgcgccc ccccccgacc gggcggctct ccgatccgtg 
1651 ttcgctcggc gatggacagc cgttccctgt ccgttcatga tggcgccatg 
1701 cagaccctta ccgttcccga ttccggcctc gccccctcct gcccggccaa 
1751 aggctcgccc gcggcgtctg ccgccatctg cgcagccatg atttcgtctc 
1801 ggtggtcgaa ctcgtgcccg cgcccggcct cagggtcgac gtgatggcgc 
1851 tggggcccaa gggcgagatc tgggtggtgg aatgcaaatc ctcgcgcgcg 
1901 gactatcagt ccgaccgcaa gtggcagggc tatctcgact ggtgcgaccg 
1951 cttcttcttc gcggtggacg aggaccagcc cgggccgtcg (SEQ ID 
NO:37) 



SUBSTITUTE SHEET (RULE 26) 



WO 02/26933 



PCT/US01/30328 



50/97 



Figure 8 

1 atgggattgg acgaggtttc gcaaaagccg catgaacggc tcgccgcgtg 

51 gctggccgag gacatggccg ccgtcaacgg gctgatccgc gagcggatgg 

101 cctcgaaaca cgcgccccgc attcccgagg tcacggcgca tctggtcgag 

151 gccggcggca agcggctgcg gccgctcctg acgctcgccg cggcgcggct 

201 gtgcggctac gaggggccct atcacatcca tctggccgcg acggtggagt 

251 tcatccacac ggcgacgctg cttcacgacg atgtggtgga cgaaagccac 

301 cgccgccgcg gcaaacccac ggcgaacctg ctgtgggaca acaaatcctc 

351 ggtgctggtg ggcgactatc tcttcgcccg cagcttccag ctgatggtcg 

401 agaccggctc gcttcgcgtg atggacatcc tcgccaatgc ctcggccacc 

451 atctccgagg gcgaggtgct gcagctgacc gcggcccagg atctgcgcac 

501 gaccgaggac atccacctgc aggtggtgcg cggcaagacg gccgcgctct 

551 ttgccgcggc aaccgaggtg ggcggcgtgg tcgcgggcgt gcccgaggcg 

601 caggtcgagg cgctccacgc ctacggggac gcgctgggga tcgccttcca 

651 gatcgtcgac gacctcctcg attatggcgg cgtggatgcc cagatcggca 

701 agaacaccgg cgacgacttc cgcgaacgca agctgacgct gccggtcatc 

751 aaggcggtgg cccaggccga tgccgaggag cgcgccttct ggcagcgggt 

801 gatcgagaag ggcgaccagc gcgagggtga cctcgagcaa gcccatgcga 

851 tcatgtcccg ccacggcgcc atggaggccg cccggcagga tgcgctccgc 

901 tgggtcacgg tggcgcgcga ggcactcggc cagctgccgg agcacccgct 

951 gcgcgagatg ctgcacgatc tggccgattt cgtggtcgaa cgcatcgcct 

1001 ga (SEQ ID NO:38) 
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1 rngldevsqkp herlaawlae dmaavnglir ermaskhapr ipevtahlve 
51 aggkrlrpll tlaaarlcgy egpyhihlaa tvefihtatl Ihddvvdesh 
101 rrrgkptanl lwdnkssvlv gdylfarsfq lmvetgslrv mdilanasat 
151 isegevlqlt aaqdlrtted ihlqvvrgkt aalfaaatev ggwagvpea 
201 qvealhaygd algiafqivd dlldyggvda qigkntgddf rerkltlpvi 
251 kavaqadaee rafwqrviek gdqregdleq ahaimsrhga meaarqdalr 
301 wvtvarealg qlpehplrem Ihdladfvve ria (SEQ ID NO: 39) 
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Figure 10 

1 ggatcgcgca gcgcctcggc cacgcgcacc atcagcagca gattgccgtt 

51 cggcagccgc gcgaagccgg ggttgaaggc gccaaggaca taggtcgcgt 

101 cgtccacccc ctcgcgcagc ggtgagcggg tcaggtcgac attgtcgggc 

151 cggaagatca gataatcgtc gctcaagcgc ttgccccctc gggtttcacg 

201 cccagcaacg gggtcaggcc ccgggggttc cggcttcagc gccggcttcc 

251 tgggcctggc ggtggtgccg gatcacctcg tcgatgatga agcgcaggaa 

301 tttctcggaa aattcggggt cgagatcggc atcctgcgcc agcgcgcgca 

351 gccgggcgat ctgcgcctcc tcgcggccgg gatcggcggg cggcagcccg 

401 gattcggcct tgtagcgccc caccgcctgg gtcaccttga accgctcggc 

451 gagcatgaag acgagcgccg catcgatatt gtcgatgctc tggcgatagc 

501 gggtcagcgt cgcgtcggtc atgcgaatct cctttgccgc tgcggcacgg 

551 ccatgcaagc acctcttgcc tttgcaatgc acaaaggcca gaggctcgtt 

601 gcatatgagc gcaaccgtcc accgcctggg ctcgcgaacc cagccttcgc 

651 tcgatccgat catggcgctg gtcgcccagg acatgaacct ggtgaacgcg 

701 gtgatcctcg atcgcatgca gtccgagatc ccgctgatcc ccgaactcgc 

751 cggccatctg atcgctggcg gcggcaagcg gatgcggccg atgctgacgc 

801 tcgccagcgc ccggctgctc ggctattcgg gcacgcgcca ccacaagctg 

851 gcggcggcag tggagttcat ccacaccgcg acgctgctgc atgacgacgt 

901 ggtcgacagc tcggacctgc gccgcggccg ccgcaccgcc aacatcatct 

951 ggggcaatcc cgccagcgtg ctggtcggcg acttcctgtt cagccgctcg 

1001 ttcgagctga tggtcgaggc cgaaagcctc aaggcgctgc acatcctgtc 

1051 gaacgccagc gcggtgatcg ccgagggcga agtcaaccag ctgaccgcgg 

1101 tgcgccggat cgacctgtcc gaggatcgct atctcgacat catcggcgcc 

1151 aagactgcgg cgctgttcgc cgccgcctgc cgggtggcgg gcgtggtcgc 

1201 cgagcgtccc gaggcggagg aactcgcgct cgacgcctat ggccgcaacc 

1251 tcggcatcgc tttccagctg gtcgacgacg cgatcgacta tgtctcggac 

1301 gcgtcgacga tgggcaagga tgccggcgac gatttccgcg aaggcaagat 

1351 gacgctgccg gtggtcctgg cgtacgcgcg cggcgacgag gcggaacgcg 

1401 gcttctggaa ggaagcgatt tcgggccgcc gcatctcgga cgaggatttc 

1451 gccgaggcga tccggctggt gcagagctgc cgcgcggtgg acgacacgct 

1501 cgcccgtgcc cgccattacg gccagctcgc gatcgatgcg ctgggcggct 

1551 tccgcgcctg cgaggcgaag gacgcgatgg tcgaggcggt cgaattcgcg 

1601 gtggcgcgcg cctactgacg cgcgccgacc ggagcatttc cgggtggatc 

1651 gcttgcgatc caaggctcgg gaaatgcgac catcaaaaag cttccgggga 

1701 ttacgcctcg gtcgactttt cttcgccctc gtcctcgtcg acttcgagcg 
17 51 cgtcttcctc gtccatgtcg agcactacct cgatgccctc gacgatcagg 
1801 tcgagctgct cgtagctcgc cgtcatctcg ate (SEQ ID NO: 40) 
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1 atgagcgcaa ccgtccaccg cctgggctcg cgaacccagc cttcgctcga 

51 tccgatcatg gcgctggtcg cccaggacat gaacctggtg aacgcggtga 

101 tcctcgatcg catgcagtcc gagatcccgc tgatccccga actcgccggc 

151 catctgatcg ctggcggcgg caagcggatg cggccgatgc tgacgctcgc 

201 cagcgcccgg ctgctcggct attcgggcac gcgccaccac aagctggcgg 

251 cggcagtgga gttcatccac accgcgacgc tgctgcatga cgacgtggtc 

301 gacagctcgg acctgcgccg cggccgccgc accgccaaca tcatctgggg 

351 caatcccgcc agcgtgctgg tcggcgactt cctgttcagc cgctcgttcg 

401 agctgatggt cgaggccgaa agcctcaagg cgctgcacat cctgtcgaac 

451 gccagcgcgg tgatcgccga gggcgaagtc aaccagctga ccgcggtgcg 

501 ccggatcgac ctgtccgagg atcgctatct cgacatcatc ggcgccaaga 

551 ctgcggcgct gttcgccgcc gcctgccggg tggcgggcgt ggtcgccgag 

601 cgtcccgagg cggaggaact cgcgctcgac gcctatggcc gcaacctcgg 

651 catcgctttc cagctggtcg acgacgcgat cgactatgtc tcggacgcgt 

7 01 cgacgatggg caaggatgcc ggcgacgatt tccgcgaagg caagatgacg 
751 ctgccggtgg tcctggcgta cgcgcgcggc gacgaggcgg aacgcggctt 
801 ctggaaggaa gcgatttcgg gccgccgcat ctcggacgag gatttcgccg 

8 51 aggcgatccg gctggtgcag agctgccgcg cggtggacga cacgctcgcc 
901 cgtgcccgcc attacggcca gctcgcgatc gatgcgctgg gcggcttccg 
951 cgcctgcgag gcgaaggacg cgatggtcga ggcggtcgaa ttcgcggtgg 

1001 cgcgcgccta ctga (SEQ ID NO: 41) 
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Figure 12 



1 msatvhrlgs 

51 hliagggkrm 

101 dssdlrrgrr 

151 asaviaegev 

201 rpeaeelald 

251 lpvvlayarg 

301 rarhygqlai 



rtqpsldpim 
rpmltlasar 
taniiwgnpa 
nqltavrrid 
aygrnlgiaf 
deaergfwke 
dalggf race 



alvaqdmnlv 
llgysgtrhh 
svlvgdf If s 
lsedryldii 
qlvddaidyv 
aisgrrisde 
akdamveave 



navildrmqs 
klaaavef ih 
rsfelmveae 
gaktaalf aa 
sdastmgkda 
df aeairlvq 



eiplipelag 
tatllhddvv 
slkalhilsn 
acrvagvvae 
gddf regkmt 
scravddtla 



favaray (SEQ ID NO: 42) 
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Figure 13 (page 1 of 5) 



RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 

RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 

RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 

RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 

RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 

RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 

RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 



372 atg ggattggac 

605 atg agcgcaacc 

1 atgattcagtatgtatatttaaaacatatgaggaaattat 
I 

1 atg gccatcga- 

384 ga ggtttcgcaaaagccgcat gaac 

617 gtccaccgcctgggctcgcgaacccagccttcgctcgatc 

41 gg agtcttggaaaagtccgtt cgac 

1 

12 tttc aa gcaa gata 

4 09 ggctcgccgcgtggctggccgaggacatggccgccgtca- 
657 cgatcatggcgctggtcgcccaggacatgaacctggtga- 

66 tgttcttcggttttct — actacgaaccgcaatgcttcac 
I atgctggcctgca- 

26 ttctcg-ctcctg — ttgctcaagattttgcagcgatgg- 

4 48 acgggctgatccgcgagcggatggcctcgaaaca cgc 

696 acgcggtgatcctcgatcgcatgcagtccgagat c — 

104 atttaattaaaaacgag ttggaacaaatctc 

14 accgggcgatcatcgcccggatg gaaagt ccg 

62 accagtttattaatgaaggaatcagctccaaggt cgc 

4 85 g — ccccgcattc ccgaggtca cggcgc 

7 31 ccgctgatcc ccgaactcg- ccggcc 

135 a — ccagggattcgtcaaatgctgaattcaaattcagaat 

4 6 gttcccctgatcc cgcagcttg gcgccc 

99 a — ctggtcatgt c agtca gcaagc 

511 atctggtcgag gccggcgg 

7 56 atctgatcgct ggcggcgg 

173 ttcttgaagagtgttctaaatattataccattgctcaagg 

74 atcttgtcgcg gcgggagg 

122 atgtcgttgaa gcaggtgg 

530 caagcggctgcggccgc tcctgacgctcgcc 

775 caagcggatgcggccga tgctgacgctcgcc 

213 aaaacaaatgcgtccttctcttgttttgctgatgtccaaa 

93 caagcgccttcgcccgc tgctgacgctggcc 

141 aaagcgcatgcgtccga ttatg-tgcttgct 
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Figure 13 (page 2 of 5) 



RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 

RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 



561 gcggcgcggctgtgc ggctacgag — gggccc 

80 6 agcgcccggctgctc ggctattcg — ggcacg 

253 gctacaagcttgtgccatggtattgat — cggtccgtagt 

124 tccgcacgtctgtgc ggttatcagccgggtcc 

171 g gccgct-tat gcctgtggt — gaaacc 

591 1 atcacatc 

836 c gccaccac 

291 gggcgacaaatatattgatgatgatgat ttaagatc 

156 ggaccatcagcgt 

196 a atttaaag 



RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 

RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 



600 cat ctggccgcgacggtg 

845 aag ctggcggcggcagtg 

327 att ttcgacgggtcaaattcttccttctcaa 

169 catgtcggg ctcgccgcctgcgtt 

205 catgcacagaagctggcggccattatt 

618 gagttcatccacacggcga 

8 63 gagttcatccacaccgcga 

358 ttgagattagcacaaataaccgagatgatccatatagcaa 

193 gagttcattcataccgcca 

232 gaaatgctgcatacggcga 



RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 

RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 

RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 



637 cgctgcttcacgacgatgtggtggacgaaagccaccgccg 

8 82 cgctgctgcatgacgacgtggtcgacagctcggacctgcg 

398 gtttgctgcatgacgatgtgattgatcacgctaatgtccg 

212 cactgctgcatgatgatgtcgtggatgagagcacgttgcg 

251 ctctggtacatgatgatgatgtagatgagtctggcttacg 

677 ccgcggcaaacccacg-gcgaacctgctgtgggacaacaa 
922 ccgcggccgccgcacc-gccaacatcatctggggcaatcc 
438 tagaggctcaccttcaagcaatgttgctttcgg ta 

252 tcgggggctggcttcg-gccaatgccgtgttcggcaacaa 
291 ccgtggcagaccaaca-gcaaatgcgacatggaataacca 

716 atcctcggtg ctggtgggcgactatctcttcgcccg 

961 cgccagcgtg ctggtcggcgacttcctgttcagccg 

47 3 atcgacggtcaatccttgcgggtaatttcatccttgcacg 

291 ggcgtccgtg ctggtaggtgacttcctgttcgcccg 

330 gactgcggta ctggtgggggattttctgattgcccg 
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RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 

RSddsdna 
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GSddsdna 
RCddsdna 



Figure 13 (page 3 of 5) 

7 52 cagcttccagctgatggtcgagaccggctcg cttc 

997 ctcgttcgagctgatggtcgaggccgaaagc ctca 

513 g-gcttcga ctgctatggcccgccttcgaaatcccc 

327 ctcgttccagcttatgacagcagacggctcc ctga 

3 66 ggcatttgatctgctggttgatctggacaat— atga 

7 87 gcgtgatggacatcctcgccaatgcctcggccaccatctc 
1032 aggcgctgcacatcctgtcgaacgccagcgcggtgatcgc 
54 8 aagttacggagttgttagctacagtgatagcagacttggt 
3 62 aggtcatggcgatcctgtcggatgcatcggcgacaattgc 
401 tcctgttaaaggacttctctacaggaacctgtgagattgc 



RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 



827 cgagggcgaggtgctgcagctgaccgcgg — cccaggatc 
1072 cgagggcgaagtcaaccagctgaccgcggtgcgccggatc 
58 8 tcgaggtgagtttttgcagctaaaaaata — ctatggat- 
4 02 tgaaggtgaagtccttcagatggtcgtgc — agaacgacc 
4 41 tgagggtgaagtattgcagttgc agg — cacagcatc 



RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 

RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 



8 65 tgcgc acgaccgaggacatccacc 

1112 — gac ctgtccgaggatcgctatc 

625 — cct tcatctttggaaataaaacaatcaaattttga 

4 40 ttacg acgcctgtagaacgctatc 

476 agccagatacaacagaagatatttatt 

8 89 tgcaggtggtgcgcggcaagacggccgcgct 

1134 tcgacatcatcggcgccaagactgcggcgct 

660 ctattatattgaaaaaagttttttg-aaaacagccagttt 

4 64 ttgaagtcattcacggcaagacggctgcgct 

503 tacagattattcacggtaaaacctcacggtt 



RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 



920 ctttgccgcggcaaccgaggtgggcggcgtggtcg 

1165 gttcgccgccgcctgccgggtggcgggcgtggtcg 

699 aatttcca aaagctgcaaggcttctacaatcct 

4 95 gtttgcggctgcctgccgtgtcggcgctgtcgtgg 

534 gttcgaactggcgaccgaaggcgctgcaatactgg 



RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 



955 cgggc gtgcccgaggcgcaggtcgaggcgctccacgc 

1200 ccgag cgtcccgaggcggaggaactcgcgctcgacgc 

7 32 cggacaatgttctcctactgtagcaacagctgctgga-ga 

530 ccgag cgtccggaagcagaagaggaagctctggagcg 

569 caggc aaacctga ataccgtgaacctttacgt 
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RSddsdna 
STddsdna 
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RSddsdna 
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RCddsdna 



Figure 13 (page 4 of 5) 

992 c tacggggacgcgctggggatcgccttccagatcgtc 

1237 c tatggccgcaacctcggcatcgctttccagctggtc 

771 a tacggtcgatgcattggtactgcttttcaactaatg 

567 g tttggcaccaatctgggtatggcgttccagcttgtt 

601 cgttttgccggacactttggcaat-gcttttcagattatt 

102 9 gacgacctcctcgattatggcggcgtg-gatgcccagatc 
127 4 gacgacgcgatcgactatgtctcggac-gcgtcgacgatg 
8 08 gatgacgtgttggactat-acgtcgaaagatgatacttta 
604 gatgatgccctggattatgccgcagac-cagcaggttttg 
640 gatgatattctggattacacttcagat-gctgatacgctc 

10 68 ggcaagaacaccggcgacgacttcc-gcgaacgcaagctg 

1313 ggcaaggatgccggcgacgatttcc-gcgaaggcaagatg 

8 47 ggaaaggcggctggtgcagat ttgaagctagggttggcta 

643 ggcaagaccgttggtgatgacatgc-gtgaaggcaagatc 

67 9 ggcaaaaatattggcgatgacttga-tggaaggcaaaccc 



RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 

RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 

RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 

RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 



1107 acgctgccggtcatcaaggcggtggcccaggccgatgcc- 
1352 acgctgccggtggtcctggcgtacgcgcgcggcgacgag- 
887 cagct-cccgtcctctttgc-atggaaaaagt — atcca- 

682 accctgccggtcct ggccgcctatgaggctggct 

718 accctgccgctgattgcagcaatgcaaaatactcaaggt- 

114 6 gaggagcgcgccttctggcagcgggtgatcgagaa 

1391 gcggaacgcggcttctggaaggaagcgatttcg — 

922 ga acttggtgca atgattgtgaa 

716 cgccggaagatcgtattttctgggagcgcgtcattggaga 
757 gaacagcgcgacctgatccgtcgc agca 

1181 gggcgaccagcgcgagggtgac — ctcgagcaagcccatg 

142 4 ggccgccgcatctcggac — gaggatttcgccgagg 

945 tagattcaatcatccttctgat — atccaacgggctcgtt 

7 56 aggggagcagactgaggacgat — ctgcctcatgctctga 

7 85 ttgccactggcg-gtacttcacagcttgaacaagttattg 

1219 cgatca tgtcccgccacggcgccatggaggc — c 

14 58 cgatccggctggtgcagagctgccgcgcggtggacga — c 

983 ctttgg ttgagtgcactgatgctatcgagca — a 

7 94 acctga ttgcaaagacgggtgcgatcaatacgac 

824 cgattg tacaaaattcgggagcgctgga 
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RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 



Figure 13 (page 5 of 5) 

1251 gcccggcaggatgcgctccgctgggtcacggtggcgcgcg 

14 96 acgctcgcccgtgcccgccattacggccagctcgcgatcg 

1015 accatcacttgggcaaaagaatatatcaaaaaagccaaag 

8 28 gatcgcccg — cgcgcaggtctatgccgacgcagctgttg 

852 ttattgccataagcgtgctactgaagaaaccgagcgagca 



RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 

RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 



12 91 aggcactcggccagctgccggagcacccgctgcgcg 

1536 atgcgct-gggcggcttcc-gcgcctgcgaggcgaa 

1055 attcccttctgtgtctccctgattcacctgcaagga 

8 66 aagccctgtccattttcccggatagcgaactgcgcc 

8 92 ttacaggcactagaaatattacctgagagtacttaccggc 

1327 agatgc — tgcacgatctggccgatttcgtggtcgaacgc 
1570 ggacgcgatggtcgaggcggtcgaattcgcggtggcgcgc 
10 91 aggcac — tttttgcgttggctgataaagtaataacgaga 
902 gccttc — tgatcgaaacggttcagttcacggtgaatcgg 
932 aggcgc — tggttaacttgacccgcttagctttagaccga 



RSddsdna 
STddsdna 
SPddsdna 
GSddsdna 
RCddsdna 



1365 atcgcctga 

1610 gcctactga 

1129 aagaagtga 

940 gcccgctaa 

970 atccaataa 



SUBSTITUTE SHEET (RULE 26) 



WO 02/26933 



PCT7US01/30328 



60/97 



Figure 14 (page 1 of 2) 
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372 mgldevsq kphe 

605 msatv hrlgsrtq psld 

1 miqyvylkhmrklwslgkvrstvlrf sttn 
I 

1 maidf kq 

408 rlaawlae-dmaavnglirermaskhapri 
656 pimalvaq-dmnlvnavildrmqse-ipli 
31 rnashlikneleqispgirq-mlnsnsef 1 

1 mlacnraiiarmesp-vpli 

8 dilapvaq-dfaamdqfinegisskva-lv 

4 95 pevtahlveaggkrlrplltla aarlc 

740 pelaghliagggkrmrpmltla sarll 

60 eecskyytiaqgkqmrpslvllmskatslc 

20 pqlgahlvaaggkrlrplltla sarlc 

36 msvskhvveaggkrmrpimcll aayac 

576 gye-gp- 

821 gys-gt- 

90 hgidrsvvgdkyiddddlrsf stgqi-lp- 

47 gyqpgpd 

63 get-nl- 

591 — yhih-laatvefihtatllhddvvdesh 
836 — rhhk-laaavef ihtatllhddvvdssd 
118 — sqlr-laqitemihiasllhddvidhan 
54 hqrhvg-laacvef ihtatllhddwdest 
68 — khaqklaaiiemlhtatlvhdddvdesg 

672 rrrgkptanllwdnkssvlvgdylf arsfq 
917 lrrgrrtaniiwgnpasvlvgdf If srsf e 
145 vrrgspssnvaf gnrrsilagnf ilarast 
83 lrrglasanavfgnkasvlvgdf If arsfq 
96 lrrgrptanatwnnqtavlvgdf liarafd 

7 62 Imvetgslrvmdilanasatisegevlqlt 

1007 lmveaeslkalhilsnasaviaegevnqlt 

175 amarlrnpqvtellatviadlvrgef lqlk 

113 lmtadgslkvmailsdasatiaegevlqmv 

126 llvdldnmillkdf stgtceiaegevlqlq 
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Figure 14 (page 2 of 2) 

852 aaqdlrtte dihlqvvrgktaalf 

1097 avrridlse dryldiigaktaalf 

2 05 ntmdpssleikqsnfdyyieksf Iktasli 

143 vqndlttpv erylevihgktaalf 

156 aqhqpdtte diylqiihgktsrlf 

924 aaatevggvvagvpeaqvealhaygdalgi 

1169 aaacrvagvvaerpeaeelaldaygrnlgi 

235 sksckastilgqcsptvataageygrcigt 

167 aaacrvgavvaerpeaeeealerfgtnlgm 

180 elategaailagkpeyr-eplrrfaghfgn 

1014 afqivddlldyggvdaqigkntgddf rerk 

1259 afqlvddaidyvsdastmgkdagddfregk 

2 65 afqlmddvldytskddtlgkaagadlklgl 

1 97 af qlvddaldyaadqqvlgktvgddmregk 

2 09 afqiiddildytsdadtlgknigddlmegk 

1104 ltlpvikavaqadaeerafwqrviekgdq- 

1349 mtlpvvlayargdeaergfwkeaisgrri- 

2 95 atapvlfa wkkypelgami 

227 itlpvlaayeagspedrifwervigegeq- 

239 ptlpliaamqntqgeqrdlirrsiatggt- 

1191 regdleqahaimsrhgameaarqda 

1436 sdedfaeairlvqscravddtlara 

314 vnrfnhpsdiqrarslvectdaieqtitwa 

256 teddlphalnliaktgainttiara 

2 68 sq — leqviaivqnsgaldychkra 

12 66 Irwvtvarealgqlpehplremlhdladf v 

1511 rhygqlaidalggfraceakdamveavefa 

34 4 keyikkakdsllclpdsparkalfaladkv 

281 qvyadaavealsifpdselrrllietvqft 

291 teeteralqaleilpestyrqalvnltrla 

1356 veria* 

1601 varay* 

374 itrkk- 

311 vnrar- 

321 ldriq- 
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Figure 15 (page 1 of 2) 

Hidxsp 1 mtnnmnnypllslinspedlrllnkdqlpqlcqelrayllesvsqtsghl 
Ecdxsp 1 msfdiakyptlalvdstqelrllpkeslpklcdelrrylldsvsrssghf 
Hpdxsp 1 milqnktfdlnpndiaglelvcqtlrnrilevvsangghl 

Hidxsp 51 asglgtveltvalhyvyktpfdqliwdvghqayphkiltgrreqmstirq 
Ecdxsp 51 asglgtveltvalhyvyntpfdqliwdvghqayphkiltgrrdkigtirq 
Hpdxsp 41 ssslgavelivgmhalfdcqknpfifdtshqayahklltgrfesf stlrq 

Hidxsp 101 kdgihpfpwreesefdvlsvghsstsisaglgiavaaerenagrktvcvi 
Ecdxsp 101 kgglhpfpwrgeseydvlsvghsstsisagigiavaaekegknrrtvcvi 
Hpdxsp 91 fqglsgftkpsesaydyfiaghsstsvsigvgvakaf rlkqtlgmpiall 

Hidxsp 151 gdgaitagraafealnhagalhtdmlvilndnemsisenvgalnnhlarif 
Ecdxsp 151 gdgaitagmaf eamnhagdirpdmlvilndnemsisenvgalnnhlaqll 
Hpdxsp 141 gdgsisagifyealnelgdrkypmimilndnemsistpigalskalsqlm 

Hidxsp 201 sgslystlrdgskkildkvppiknfm-kkteehmkgvmfspestlfeelg 
Ecdxsp 201 sgklysslreggkkvfsgvppikell-krteehikgmvv-pgtlfeelg 
Hpdxsp 191 kgpf yqsf rskvkkilstlpesvnylasrf eesf k — litp-gvf f eelg 

Hidxsp 250 f nyigpvdghnidelvatltnmrnlkgpqf lhiktkkgkgyapaekdpig 
Ecdxsp 24 8 fnyigpvdghdvlglittlkninrdlkgpqf Ihimtkkgrgyepaekdpit 
Hpdxsp 238 inyigpinghdlgtiietlklakelkepvlihaqtlkgkgykiaegryek 

Hidxsp 300 fhgvpkfdpisgelpknnsk-ptyskifgdwlcemaekdakiigitpamr 
Ecdxsp 2 98 fhavpkfdpssgclpkssgglpsyskifgdwlcetaakdnklmaitpamr 
Hpdxsp 28 8 whgvgpfdldtglskksksatlspteaysntllelakkdekivgvtaamp 

Hidxsp 34 9 egsgmvef sqrf pkqyf dvaiaeqhavtf atglaiggykpvvaiystf lq 
Ecdxsp 34 8 egsgmvef srkfpdryf dvaiaeqhavtf aaglaiggykpivaiystflq 
Hpdxsp 338 sgtgldklidayplrf fdvaiaeqhaltsssamakegf kpfvsiystf lq 

Hidxsp 399 raydqlihdvaiqnlpvlf aidragivgadgathqgaf disfmrcipnmi 
Ecdxsp 398 raydqvlhdvaiqklpvlfaidragivgadgqthqgafdlsylrcipemv 
Hpdxsp 38 8 raydsivhdacisslpiklaidragivgedgethqglldvsylrsipnmv 

Hidxsp 44 9 imtpsdenecrqmlytgyqcgk-paavryprgn-avgvkltplemlpigk 
Ecdxsp 44 8 imtpsdenecrqmlyt gyhyndgpsavryprgn-avgveltpleklpigk 
Hpdxsp 438 ifaprdnetlknavyfanehdsspcafryprgsfalkegvfepsgfvlgr 



SUBSTITUTE SHEET (RULE 26) 



WO 02/26933 



PCT/US01/30328 



63/97 

Figure 15 (page 2 of 2) 

Hidxsp 497 srlirkgqkiailnf gtllpsa — lelsek lnatvvdmrfvkpidie 

Ecdxsp 497 givkrrgeklailnf gtlmpea--akvaes lnatlvdmrfvkpldea 

Hpdxsp 488 sellkkegeilligygngvgrahlvqlalkekniecalldlrf lkpldhn 

Hidxsp 542 minvlaqthdylvtleenaiqggagsavaevlnssgkstallqlglpdyf 
Ecdxsp 542 lilemaashealvtveenaimggagsgvnevlmahrkpvpvlniglpdf f 
Hpdxsp 538 1-saiiapyqklyvf sdnyklggvasailef lseqnilkpvksf eitdef 

Hidxsp 592 ipqatqqealadlgldtkgieekilnf iakqgnl 

Ecdxsp 592 ipqgtqeemraelgldaagmeakikawla 

Hpdxsp 587 imhgntalvekslgldtesltdailkdlgqer — 
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Figure 16 

Rpodsp 1 — mniivkiqqnlkdevtqlndliisclksdaeliekvgkylveaggkri 

Ecoppp 1 mnlekinel taqdmagvnaaileqlnsdvqlinqlgyyivsgggkri 

Gsddsp 1 mlacnraiiarmespvplipqlgahlvaaggkrl 

Rcsdsp 1 maidf kqdilapvaqdf aamdqf inegisskvalvmsvskhvveaggkrm 

Rpodsp 4 9 rpll tiitakmf dykgn nhiklasavef ihaatllhddvvdnstlr 

Ecoppp 4 8 rpmiavlaaravgyegna hvtiaalief ihtatllhddvvdesdmr 

Gsddsp 35 rplltlasarlcgyqpgpdhqrhvglaacvefihtatllhddwdestlr 

Rcsdsp 51 rpimcllaayacg-etnlkhaqk — laaiiemlhtatlvhddwdesglr 

Rpodsp 95 rf kptanviwgsktsilvgdf If sqsf klmvasgcikamnvlakasviis 

Ecoppp 94 rgkatanaafgnaasvlvgdfiytrafqmmtslgslkvlevmseavnvia 

Gsddsp 85 rglasanavfgnkasvlvgdflfarsfqlmtadgslkvmailsdasatia 

Rcsdsp 98 rgrptanatwnnqtavlvgdfliarafdllvdldnmillkdfstgtceia 

Rpodsp 145 egevvqlvklnerriitideyqqivksktaelfgaacevgaiiaeqvdrv 

Ecoppp 144 egevlqlmnvndpdi- teenymrviys ktarlf eaaaqcsgilagctpee 

Gsddsp 135 egevlqmvvqndltt-pverylevihgktaalfaaacrvgavvaerpeae 

Rcsdsp 148 egevlqlqaqhqpdt-tediylqiihgktsrlfelategaailagkpe-y 

Rpodsp 195 skdvqnfgrllgtifqviddlldylgsdkqvgknigddflegkvtlplif 
Ecoppp 193 ekglqdygrylgtafqliddlldynadgeqlgknvgddlnegkptlpllh 
Gsddsp 184 eealerfgtnlgmafqlvddaldyaadqqvlgktvgddmregkitlpvla 
Rcsdsp 196 replrrfaghfgnafqiiddildytsdadtlgknigddlmegkptlplia 

Rpodsp 245 lyhkleqdkqlwlenmlksdk — rtkddf vkirdlmlkhaiynetvnyls 

Ecoppp 243 amhhgtpeqaqmirtaieqgngrhllepvleamnac gslewtrqrae 

Gsddsp 234 ayeagspedrif wervi — gegeqteddlphalnliaktgainttiaraq 
Rcsdsp 24 6 amqntqgeqrdlirrsiatggtsqle qviaivqnsgaldychkrat 

Rpodsp 293 sleneannllnkipvqniykyylf siirfilyrsy 

Ecoppp 290 eeadkaiaalqvlpdtpw-realiglahiavqrdr 

Gsddsp 282 vyadaavealsifpdsel-rrllietvqf tvnrar 

Rcsdsp 2 92 eeteralqaleilpesty-rqalvnltrlaldriq 
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Figure 17 

Rpodsp 1 mniivkiqqnlkdevtqlndliisclksdaeliekvgkylve 

Ecodsp 1 mnlekineltaq dmagvnaaileqlnsdvqlinqlgyyivs 

Hiodsp 1 mkkqdlmsideiqkladp dmqkvnqnilaqlnsdvpligqlgf yivq 

Gsddsp 1 mlacnraiiarmespvplipqlgahlva 

Rcsdsp 1 rnaidf kqdilapvaqdf aamdqf inegisskvalvmsvskhvve 



Rpodsp 4 3 aggkrirplltiitakmfdykgn nhik-lasavef ihaatllhddv 

Ecoppp 42 gggkrirpmiavlaaravgyegna hvt-iaalief ihtatllhddv 

Hiods 142 gggkrirpliavlaarslgf egsn s it-cat f vef ihtasllhddv 

Gsddsp 2 9 aggkrlrplltlasarlcgyqpgpdhqrhvg-laacvef ihtatllhddv 

Rcsdsp 4 5 aggkrmrpimcllaayac getnlkhaqklaaiiemlhtatlvhddv 

Rpodsp 88 vdnstlrrf kptanviwgsktsilvgdf If sqsf klmvasgcikamnvla 
Ecoppp 87 vdesdmrrgkatanaafgnaasvlvgdf iytraf qmmtslgslkvlevms 
Hiods 277 vdesdmrrgratanaefgnaasvlvgdf iytraf qlvaqleslkilsima 
Gsddsp 7 8 vdestlrrglasanavfgnkasvlvgdf If arsfqlmtadgslkvmails 
Rcsdsp 91 vdesglrrgrptanatwnnqtavlvgdf liarafdllvdldnmillkdf s 

Rpodsp 138 kasviisegevvqlvklnerriitideyqqivksktaelf gaacevgaii 
Ecoppp 137 eavnviaegevlqlmnvndpdi-teenymrviysktarlfeaaaqcsgil 
Hiods 427 datnvlaegevqqlmnvndpet-seanymrviysktarlf evagqaaaiv 
Gsddsp 128 dasatiaegevlqmvvqndltt-pverylevihgktaalf aaacrvgavv 
Rcsdsp 141 tgtceiaegevlqlqaqhqpdt-tediylqiihgktsrlf elategaail 

Rpodsp 188 aeqvdrvskdvqnf grllgtif qviddlldylgsdkqvgknigddf legk 
Ecoppp 18 6 agctpeeekglqdygrylgtaf qliddlldynadgeqlgknvgddlnegk 
Hiods 57 4 aggteaqekalqdygrylgtaf qlvddvldysantqalgknvgddlaegk 
Gsddsp 177 aerpeaeeealerfgtnlgmafqlvddaldyaadqqvlgktvgddmregk 
Rcsdsp 190 agkpeyre-plrrf aghf gnaf qiiddildytsdadtlgknigddlmegk 

Rpodsp 238 vtlpliflyhkleqdkqlwlenmlksd — krtkddfvkirdlmlkhaiyn 

Ecoppp 236 ptlpllhainhhgtpeqaqmirtaieqgngrhllepvleamnac gsle 

Hiods 72 4 ptlpllhamrhgnaqqaalireaieqggkreaidevlaimteh ksld 

Gsddsp 227 itlpvlaayeagspedrif wervi — gegeqteddlphalnliaktgain 

Rcsdsp 239 ptlpliaamqntqgeqrdlirrsiatggtsqleqviaivqns gald 

Rpodsp 28 6 etvnylssleneannllnkipv — qniykyylf siirf ilyrsy- 

Ecoppp 2 83 wt rqraeeeadkaiaalqvlpdtpwrealiglahiavqrdr- 

Hiods 8 65 ya mnrakeeaqkavdaieilpeseykqalislaylsvdrny* 

Gsddsp 27 5 tt iaraqvyadaavealsif pdselrrllietvqf tvnrar- 

Rcsdsp 285 yc hkrateeteralqaleilpestyrqalvnltrlaldriq- 
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FIG. 18 
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FIG. 26 
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Figure 27 (page 1 of 2) 



Bsdxrp 1 mknicllgatgsigeq 

Hmdxrp 1 mgkqnivilgstgsigks 

Ecdxrp 1 mkqltilgstgsigcs 

Zmdxrp 1 msqprtvtvlgatgsighs 

Sldxrp 1 mkavtllgstgsigtq 

Ssdxrp 1 mvkrisilgstgsigtq 



Mtdxrp 1 matggrvvirrrgdnevvahndevtnstdgradgrlrvvvlgstgsigtq 

Bsdxrp 17 tldvlrahqdqfqlvsmsfg-rnidkavpmievf qpkf vsvgdldtyhkl 

Hmdxrp 19 tlsviennpqkyhafalvgg-knveamf eqcikf rphf aalddvnaakil 

Ecdxrp 17 tldvvrhnpehf rvvalvag-knvtrmveqclef spryavmddeasakll 

Zmdxrp 20 tldliernldryqvialtan-rnvkdladaakrtnakraviadpslyndl 

Sldxrp 17 tldileqypdrf rlvglaag-rnvallseqirrhrpeivaiqdaaqlsel 

Ssdxrp 18 tldivthhpdaf qvvglaag-gnvallaqqvaef rpeivairqaekledl 

Mtdxrp 51 alqviadnpdrf evvglaaggahldtllrqraqtgvtniavadehaaq — 

Bsdxrp 66 kqmsf sf ec qiglgeeglieaavmeevdivvnallgsvgliptlkai 

Hmdxrp 68 rekli-ahhiptevlagrraicelaahpdadqimasivgaagllptlsav 

Ecdxrp 66 ktmlq-qqgsrtevlsgqqaacdmaaledvdqvmaaivgaagllptlaai 

Zmdxrp 69 keala gssveaaagadalve-aammgadwtiriaaiigcaglkatlaai 

Sldxrp 66 qaaiadl-dnppliltgeagvtevarygdaeivvtgivgcagllptiaai 

Ssdxrp 67 kaavaeltdyqpmywgeegvvevarygdaesvvtgivgcagllptmaai 

Mtdxrp 99 rvgdip yhgsdaatrlveqteadvvlnalvgalglrptlaal 



Bsdxrp 113 eqkktialanketlvtaghivkehakkydvpllpvdsehsaifqalqg — 

Hmdxrp 117 kagkrvllankeslvtcgqlf idavknygskllpvdsehnaif q s-1 

Ecdxrp 115 ragktillankeslvtcgrlfmdavkqskaqllpvdsehnaif q s-1 

Zmdxrp 115 rkgktvalankeslvsaggliriidavrehgttllpvdsehnaif q c-f 

Sldxrp 115 eagkdialanketliaagpvvlpllqkhgvtitpadsehsaifqciqg-1 
Ssdxrp 117 aagkdialanketliagapvvlplvekmgvkllpadsehsaif qclqg-v 
Mtdxrp 141 ktgarlalankeslvaggslvlraarpg — qivpvdsehsalaqclrggt 

Bsdxrp 161 -eqak nierliitasggsf rdktreelesvtvedalkh 

Hmdxrp 163 ppeaqekigf cplsel-gvskiiltgsggpf rytpleqf tnitpeqavah 

Ecdxrp 161 pqpiqhnlgyadleqn-gvvsilltgsggpf retplrdlatmtpdqacrh 

Zmdxrp 161 phhnrdy vrriiitasggpf rttslaematvtperavqh 

Sldxrp 164 sthad f rpaqvvaglrrilltasggaf rdwpverlsqvtvadalkh 

Ssdxrp 166 pe gglrriiltasggaf rdlpverlpfvtvqdalkh 

Mtdxrp 189 pde vaklvltasggpf rgwsaadlehvtpeqagah 
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Bsdxrp 198 pnwsmgakitidsatmmnkglevieahwlf dipyeqidvvlhkesiihsm 
Hmdxrp 212 pnwsmgkkisvdsatmmnkgleyiearwlf nasaeemeviihpqsiihsm 
Ecdxrp 210 pnwsmgrkisvdsatitimnkgleyiearwlf nasasqmevlihpqsvihsm 
Zmdxrp 200 pnwsmgakisidsatmmnkglelieayhlf qiplekf eilvhpqsvihsm 
Sldxrp 210 pnwsmgrkitvdsatlmnkglevieahylfgldydyidivihpqsiihsl 
Ssdxrp 2 02 pnwsmgqkitidsatlmnkglevieahylf gldydhidivihpqsiihsl 
Mtdxrp 22 4 ptwsmgpmntlnsaslvnkgleviethllf gipydridvvvhpqsiihsm 

Bsdxrp 24 8 vef hdksviaqlgtpdmrvpiqyaltypdrlplpdakrlelweigslhf e 
Hmdxrp 2 62 vryvdgsvitqmgnpdmrtpiaetmayphrtf a-gvepldf f kikeltf i 
Ecdxrp 2 60 vryqdgsvlaqlgepdmrtpiahtmawpnrvns-gvkpldf cklsaltf a 
Zmdxrp 250 veyldgsilaqigspdmrtpightlawpkrmet-paesldf tklrqmdf e 
Sldxrp 260 ieledtsvlaqlgwpdmrlpllyalswpdrlst-qwsaldlvkagslef r 
Ssdxrp 252 ievqdtsvlaqlgwpdmrlpllyalswperiyt-dwepldlvkagslsf r 
Mtdxrp 27 4 vt f idgstiaqasppdmklpislalgwprrv-sgaaaacdf htasswef e 

Bsdxrp 2 98 kadf drf rclqf af esgkiggtmptvlnaanevavaaf lagkipf laied 

Hmdxrp 311 epdf nrypnlklaidaf aagqyattamnaaneiavqaf ldrqigfmdiak 

Ecdxrp 309 apdydrypclklameaf eqgqaattalnaaneitvaaf laqqirf tdiaa 

Zmdxrp 2 99 apdyerf palt lames iksggarpavmnaaneiavaaf ldkkigf ldiak 

Sldxrp 30 9 epdhakypcmdlayaagrkggtmpavlnaaneqavalf leeqihf sdipr 

Ssdxrp 301 epdhdkypcmqlaygagraggampavlnaaneqavalf lqekisfldipr 

Mtdxrp 323 pldtdvfpavelarqagvaggcmtavynaaneeaaaaf lagrigfpaivg 



Bsdxrp 34 8 cieka — ltrhqllkkpswr tfkkwtk ipgdtsiqysh 

Hmdxrp 361 inskt — ierispytiqniddvleidaqare ia-ktllre — 

Ecdxrp 359 lnlsv--lekmdmrepqcvddvlsvdanare varkevmrlas 

Zmdxrp 34 9 ivekt — ldhytpatpssledvf aidnear ■ iqaaalmeslp 

Sldxrp 35 9 lieracdrhqtewqqqpslddilaydawarqf v qasyqslesw 

Ssdxrp 351 liektcdlyvgqntaspdletilaadqwarrtv lensacvatrp 



Mtdxrp 37 3 iiadvlhaadqwavepatvddvldaqrwareraqravsgmas vaiastak 

Bsdxrp 38 4 kvvcs 

Hmdxrp 398 

Ecdxrp 399 

Zmdxrp 38 8 a 

Sldxrp 403 

Ssdxrp 395 

Mtdxrp 423 pgaagrhastlers 
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Figure 28 

ggcccgggctggtggggtttctggcgctggggctggtgttcggcgcgttcttcttcgtcg 

cgatcgtgacgcggaacgccaagctggcggcggggcaggtctatgtcgggctgccggtgc 

tcgcgctgctgctgctccgcgaccatccgcagggctttgccgcgacgctgtggacgatgg 

cgatcgtctgggtgtgcgacagcggcgcctattttgccggtcgcgcgatcggtgggccca 

agctcgcgccctcgatcagcccgaacaagacctgggcggggctgatcggcgggttggttg 

ccgcgatcctgttctccgccggctatgtcgcgctggcgccggggagcgcgatcggctggt 

ggctggtcgcggtgtcgccgctggtagccttcgcctcgcagatcggcgacctgtacgaga 

gccatctcaagcgggtcgcgggcgtgaaggattcgagcaacctgctgcccggccatggcg 

gcattctcgaccggctcgacggccttgtcttcgcagccccggttgcagctttgttttttg 

cgatccatcatcaggtggtcgtgggaggatactggtggtgaagcgcgtcacggtgttggg 

ggcgaccggctcggtcggcacctcgacgctggatctgatcgaacgaaatccgcacgcctt 

cgaagtcgtggcgctgaccgcaaattgcgatgtcgagaagctggctgccgcggcgatccg 

cacgcgcgcgcgctgcgccgtggtcgccgacgagaaatgcctgccggcgctacaggagcg 

gctggccggcagcggtgtcgaggcgatgggcggggcgcattcggtgtgcgacgtggcgcg 

gatgggtgctgactggacgatggctgcgatcgtcggcagcgcagggctcaagccggtgat 

ggccgcgctggaggccggtggcaccgtcgcgctcgcgaacaaggagtcgctcgtctcggc 

gggtgaggtgatgatggcggcggcccgcgcgcatggcgcgacgctgctgccggtcgattc 

ggagcacaatgcggtgttccagtgcctcgatcgcaccgcgcccaggggcgtccgccggat 

catccttaccgccagcggtggtccgttccgcgcgacgccgaaggaagcgatgcgcgacat 

cacccccgcacaggcggtggcgcatcccaactggtcgatgggcgccaagatctcggtcga 

ctccgcgacgatgatgaacaaggggctcgaactgatcgaagccttccacctgttcccggt 

cgccgccgagcaactggccgtgctggtccatcgccaatccgtcgtccattcgatggtgga 

atatgtcgacggatcggtgctggcccagctcggcacgcccgacatgcgcacgccgatcgc 

ctatgcgctggcttggcccgagcggatggagacgctgtgcccgccgctcgaccttgccac 

ggtgggtaagctcgagttcgaaaatcccgatctcgatcgcttcccggcgctcgcgctggc 

gatggaggcattgaaggcgggcggggcgcgtccggccattctcaatgccgccaacgaagt 

cgccgtcgcggcctttctcgccgggcggatcggattccttgaaattgccgcaatctctgc 

cgatacgctgtctcgctatgacccggccgcgccggaaacgctcgatgccgtgctggcgat 

cgacgcggaggcgcggctttacgcggctgagcgagtgaaggactgcgtcgcttgatccaa 

tcccccggcatcctgctcaccattctggcgttcgcgctggtgatcgggccgctcgtgttc 

ctgcacgagctgggacattatctggcgggccgcctcttcggggtgaaggccgaggaattc 

tcgatcggcttcggccgcgagatcgccggcaccaccgatcgccgcggcacgcgctggaag 

ttcagcctgttgccgctgggcggctatgtccgcttcgccggcgacatgaacccggcgagc 

cagccttcgcccgaatggctgcagaccagcccgggcc (SEQ ID NO: 95) 
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gtggtgaagcgcgtcacggtgttgggggcgaccggctcggtcggcacctcgacgctggat 
ctgatcgaacgaaatccgcacgccttcgaagtcgtggcgctgaccgcaaattgcgatgtc 
gagaagctggctgccgcggcgatccgcacgcgcgcgcgctgcgccgtggtcgccgacgag 
aaatgcctgccggcgctacaggagcggctggccggcagcggtgtcgaggcgatgggcggg 
gcgcattcggtgtgcgacgtggcgcggatgggtgctgactggacgatggctgcgatcgtc 
ggcagcgcagggctcaagccggtgatggccgcgctggaggccggtggcaccgtcgcgctc 
gcgaacaaggagtcgctcgtctcggcgggtgaggtgatgatggcggcggcccgcgcgcat 
ggcgcgacgctgctgccggtcgattcggagcacaatgcggtgttccagtgcctcgatcgc 
accgcgcccaggggcgtccgccggatcatccttaccgccagcggtggtccgttccgcgcg 
acgccgaaggaagcgatgcgcgacatcacccccgcacaggcggtggcgcatcccaactgg 
tcgatgggcgccaagatctcggtcgactccgcgacgatgatgaacaaggggctcgaactg 
atcgaagccttccacctgttcccggtcgccgccgagcaactggccgtgctggtccatcgc 
caatccgtcgtccattcgatggtggaatatgtcgacggatcggtgctggcccagctcggc 
acgcccgacatgcgcacgccgatcgcctatgcgctggcttggcccgagcggatggagacg 
ctgtgcccgccgctcgaccttgccacggtgggtaagctcgagttcgaaaatcccgatctc 
gatcgcttcccggcgctcgcgctggcgatggaggcattgaaggcgggcggggcgcgtccg 
gccattctcaatgccgccaacgaagtcgccgtcgcggcctttctcgccgggcggatcgga 
ttccttgaaattgccgcaatctctgccgatacgctgtctcgctatgacccggccgcgccg 
gaaacgctcgatgccgtgctggcgatcgacgcggaggcgcggctttacgcggctgagcga 
gtgaaggactgcgtcgcttga (SEQ ID NO: 96) 
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Figure 30 

1 vvkrvtvlga tgsvgtstld liernphafe vvaltancdv eklaaaairt 

51 rarcavvade kclpalqerl agsgveamgg ahsvcdvarm gadwtmaaiv 

101 gsaglkpvma aleaggtval ankeslvsag evmmaaarah gatllpvdse 

151 hnavfqcldr taprgvrrii ltasggpfra tpkeamrdit paqavahpnw 

2 01 smgakisvds atmmnkglel ieafhlfpva aeqlavlvhr qsvvhsmvey 

251 vdgsvlaqlg tpdmrtpiay alawpermet lcppldlatv gklefenpdl 

301 drfpalalam ealkaggarp ailnaaneva vaaflagrig fleiaaisad 

351 tlsrydpaap etldavlaid aearlyaaer vkdcva (SEQ ID NO: 97) 
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Figure 31 (page 1 of 15) 



Stdxrcds 1 

Padxrd 1 at 

Zmdxrd 1 

Sgdxrd 1 

Nmdxrd 1 

Ecdxrd 1 

Sldxrd 1 

Mldxrd 1 

Pmdxrp 1 atgagtattagttat 

Atdxrd 1 atgatgacattaaactcactatctccagctgaatccaaagctatttcttt 

Cjdxrd 1 ; 

Pfdxrd 1 : 

Stdxrcds 1 gtgg 

Padxrd 3 gagt 

Zmdxrd 1 atga 

Sgdxrd 1 ttgg 

Nmdxrd 1 a 

Ecdxrd 1 a 

Sldxrd 1 <3 

Mldxrd 1 g 

Pmdxrp 16 ttta 

Atdxrd 51 cttggatacctccaggttcaatccaatccctaaactctcaggtgggttta 

Cjdxrd 1 ■ 

Pfdxrd 1 a 



Stdxrcds 5 

Padxrd 7 

Zmdxrd 5 

Sgdxrd 5 

Nmdxrd 2 

Ecdxrd 2 

Sldxrd 2 

Mldxrd 2 

Pmdxrp 20 

Atdxrd 101 gtttgaggaggaggaatcaagggagaggttttggaaaaggtgttaagtgt 

Cjdxrd 1 

Pfdxrd 2 
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Stdxrcds 5 " 

Padxrd 7 

Zmdxrd 5 

Sgdxrd 5 

Nmdxrd 2 

Ecdxrd 2 

Sldxrd 2 

Mldxrd 2 

Pmdxrp 20 

Atdxrd 151 tcagtgaaagtgcagcagcaacaacaacctcctccagcatggcctgggag 

Cjdxrd 1 

Pfdxrd 2 

Stdxrcds 5 tga ag 

Padxrd 7 cgaccgcag 

Zmdxrd 5 gtc ag 

Sgdxrd 5 tea 

Nmdxrd 2 tga ca 

Ecdxrd 2 tga ag 

Sldxrd 2 tga aa 

Mldxrd 2 tga acaatccgatcgaggggcacgctggcggccgcct 

Pmdxrp 20 tga aa 

Atdxrd 201 agctgtccctga gg 

Cjdxrd 1 

Pfdxrd 2 tga ag 

Stdxrcds 10 -eg c gtca-cggtgttgggggcgacc 

Padxrd 16 -eg g atca-gcgtgctcggcgcgacc 

Zmdxrd 10 -cc aagaacagtca-ctgttttaggggcgacc 

Sgdxrd 8 ttctcggctcgacc 

Nmdxrd 7 -ccacaagtc ctga-ccatattaggcagtacc 

Ecdxrd 7 -ca a ctca-ccattctgggctcgacc 

Sldxrd 7 -gc a gtga-cactgctcggttcaacc 

Mldxrd 39 ccg c gtgc-tggtgttgggaagtact 

Pmdxrp 25 -aa g atcg-ttattttaggttcaact 

Atdxrd 215 -eg c ctcgtcaatcttgggatggaccaaaacccatctc 

Cjdxrd 1 atga-tactttttggaagtacg 

Pfdxrd 7 aa-atatatttatatatatt 
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Figure 31 (page 3 of 15) 



Stdxrcds 

Padxrd 

Zmdxrd 

Sgdxrd 

Nmdxrd 

Ecdxrd 

Sldxrd 

Mldxrd 

Pmdxrp 

Atdxrd 

Cjdxrd 

Pfdxrd 



-ggctcggtcggcacctcgacgctggatc- 
-ggctcgatcggcctgagcaccctggacg- 
-ggatccattggtcattcaacactggatt- 
-ggctcgatcggcacccaggccatcgacg- 
-ggcagcataggcgaaagcacgctggacg- 
-ggctcgattggttgcagcacgctggacg- 
-ggctcgatcgggacacaaaccctagaca- 
-ggctcaattggcacccaggcgctggaag- 
-ggatcgattggtaccagtactttatccg- 



34 

40 

40 

22 

3 7 

31 

31 

64 

49 

252 tatcgttggatctactggttctattggcactcagacattggata- 



22 
26 



-ggc agtataggag 

-ttttct-tcatcacaataactattaatgatttag 



Stdxrcds 

Padxrd 

Zmdxrd 

Sgdxrd 

Nmdxrd 

Ecdxrd 

Sldxrd 

Mldxrd 

Pmdxrp 

Atdxrd 

Cjdxrd 

Pfdxrd 



-tgatcgaacgaaatccgcacgccttcgaagtcg- 
-tcgtccagcgtcatcccgatcgttacgaagcct- 
-taatcgaacggaatttagatcggtatcaggtca- 
-tggtgctccgcaaccccggccggttcaaggtgg- 
-ttgtctcccgccaccccgaaaaattccgcgtat- 
-tggtgcgccataatcccgaacacttccgcgtag- 
-ttcttgagcagtatcccgatcgctttcgcctcg- 
-ttatcgccgccaatccggaccgtttcgaggtag- 
-tgattacacataatcctgataagtaccaagtgt- 
-ttgtggctgagaatcctgacaaattcagagttg- 
-taaatgctcttaaacttgctgctttaaaaaaca- 



62 

68 

68 

50 

65 

59 

59 

92 

77 

296 

35 

59 taataaataatacatcaaaatgtgtttccattgaaagaagaaaaaataac 



-tggc 
-tcgc 
-tcgc 
-tcgc 
-tcgc 
-ttgc 
-tagg 
-tcgg 
-ttgc 
-tggc 
-ttcc 



Stdxrcds 

Padxrd 

Zmdxrd 

Sgdxrd 

Nmdxrd 

Ecdxrd 

Sldxrd 

Mldxrd 

Pmdxrp 

Atdxrd 

Cjdxrd 

Pfdxrd 



99 gct- 
105 cct- 
105 ttt- 

87 gct- 
102 gct- 

96 gct- 

96 get- 
129 get- 
114 gtt- 
333 tct- 

72 cat- 



-gaeegca- 
-gactggc- 
-gaccgcc- 
-gtccgeg- 
-ggcaggg- 
-ggtggca- 
-ggeggct- 
-ggcege — 
-agttggt- 
-agetget- 
-ttctget- 



■aattgc 
-ttcagc 
-aacege 
-geegge 
-cataag 
-ggcaaa 
-ggtcgt 

c 

-ggacgt 
-ggttcg 
-ttagct 



109 gcatatataaattatggtataggatataatggaccagataataaaataac 
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Stdxrcds 

Padxrd 

Zmdxrd 

Sgdxrd 

Nmdxrd 

Ecdxrd 

Sldxrd 

Mldxrd 

Pmdxrp 

Atdxrd 

Cjdxrd 

Pfdxrd 

Stdxrcds 

Padxrd 

Zmdxrd 

Sgdxrd 

Nmdxrd 

Ecdxrd 

Sldxrd 

Mldxrd 

Pmdxrp 

Atdxrd 

Cjdxrd 

Pfdxrd 

Stdxrcds 

Padxrd 

Zmdxrd 

Sgdxrd 

Nmdxrd 

Ecdxrd 

Sldxrd 

Mldxrd 

Pmdxrp. 

Atdxrd 
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115 gatgtcgag — aagctgg- 

121 cgcctggccgaactcgag — gcgctg— 

121 

103 

118 

112 

112 

139 

130 

349 

88 

159 



-aatgtcaaa — gatctgg — 
-ggcgcggtg — gagctgc — 
-caggtcgag — aaattgg — 
-aatgtc-ac — tcgcatg — 

-aatgtggcg — ctgtt 

-gggggcgcg — cagctggacacgc 

-aatgtagagctaatgttt c 

-aatgttact — ctacttg c 

-tgtggggat — aacatcg— 



-tgc 
-tgc 
-cga 
-cgc 
-ggc 
-tag 



-tgc 
-aac 
-t— 



-aaagagtag — aagatgt- 



-aaaagaataaagttatgc 



135 -cgcg gcgatc — cgcac-g-cgcgcgc-gctgc — g-c c 

148 -ctca ggcacc — gcccc-g-tctatgc-ggtggt-g-c c 

141 -tgcg gcgaaa — agaac-g-aatgcca-agcgg — g-c --g 

123 -cgag caggccgtcgcactg-ggcgtgc-acacc — g-t c 

138 tcaat gtcaaa — cgttc caccccg-aatat — g-c c 

131 -aaca gtgcct — ggaat-t-ctctccccgctat — g-c c 

126 g tcggag — caaat-t-cggcggc-accga — c-c a 

164 -tgag gc agcgc-gccgc — gac c 

152 -aatgtttgacatt — ccaac-c-gtcgttt-gctgc — g-ttagatgac 

367 gate — aggta-a-ggagatt-taagectg-c a 

106 -cttt taaatg — ageaa-ategcaagg-tttaa — a-c c 

193 -aaaa aggat ttaa-t-agatatt-ggtgc — a-a 1 

166 gtggtcgc — eg ac ga gaaatgc 

180 ggagcagg — cc gc gg egattge 

172 gttatege— tg ac cc gtcgett 

157 gcggtggc — eg acccggccgccga ggaagccg — 

169 gtcgttgc — eg at gc cgaa — c 

163 gtaatgga — eg at gaagcgagtgcgaaactt 

154 gagattgtggcg at tc aagatgeage 

184 ggegtcac — ca at ate gecateg 

193 gatgtege — ag cc aaaatgt 

394 ttggttgc — tgttagaaac ga gtcactg 

138 caaatttg — tt tc ca taaaaga 

222 aaagaaac — ca at taatgta gcaattt 
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187 ctg— 

201 ctt— 

193 tat— 

188 ctg-- 

188 acg— 

193 ctt— 

180 tcagctg — 

206 ctg-- 

212 tgg— 

421 att— 

159 tt— 



-c eg gc — gctacagg ageggctg — 

-g ca gg — get-cget cgccgc-g — 

-a at ga — tctgaaag aggctttg — 

-c gc ga — ggccctggcggccaaggcgcag — 

-c eg cccggcttgaag ccctgttgaa 

-aaaacg at — gctacagc aacag 

-t eg ga — actgeaag eggegate — 

-a egatege — gc gg ctcagctg — 

-c agaga aactgaaa — 

-a at ga — gcttaaag aggcttta — 

c — aaaaaata agcattta — 

— tacgaatg — 



248 ttggaagtac tg gt — agtatagg- 



211 gec ggcagegg 

223 gcg ggtatccg 

217 gec ggaagctc 

218 ggc gcccgctg 

216 aegegae ggca-egg 

217 ggtagccg 

208 gca gaccttga 

229 gec ggc 

229 gec caccaa 

445 get gatttgga 

178 gtt aaacacga 

278 ctttaaatataataagggagtgtaataaaattgaaaatgtttttaa 



222 tg— 

234 ca— 

228 tg 

229 cc— 

230 eg 

225 ca 

219 ta— 

235 -g— 

238 ag 

456 ctata 
189 ta— 
324 tg-- 



tcg-ag gegat-gggegggge- 

— ccc-gg gtgct-gttcggcga- 

— ttg-ag gcagc-cgcgggtgc- 

— gcg — g gtgct-ggcgggccc- 

— actcag gtttt-acacggcgc- 

— ccg-aa gtctt-aagtgggca- 

— atc-cg ccgct-catcctgac- 

— aca-tc cctta-ccacgggac- 

— cca-aacaacagtctt-agcaggaca- 

aactcg-ag attat-tccaggaga- 

gag-tt tttatagggcaagaa- 



-gca 
-gca 
-tga 
-gga 
-gca 
-aca 



-cga 
-gca 
-gca 
-ggt 



--tta-aa gcatt-gtatgtgaataagagtgtgaatgaa 
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246 ttcgg- 
258 ggcgt- 
252 tgcct- 
252 cgcgg 
255 ggcat 
249 agccg 
240 — egg 
258 tgcg- 
267 ageca 
486 aggag 



-tgtgcgacgtggc — 
-tgtgcgaagtggc — 
-tggtcgaagccgc — 
-cgaccgagctggcc- 



-g egga 

-c ag 

-c atga 

-g egge 

ccga 

cget 

egga 

egge 

caca 

— cgac 

214 ttagagcaaattttaacagaatgt c aaga 

361 ttata tgaacaagctaga gaatttttaccagaatatttgt 



-tggttgacgt tgcct ctg 

-ettgegatatggea g 

-tgaggcaggtgtc a 

gtcac c 

-tttgtgagttagc gg 

-tgattgaggttgc c 



269 

279 

275 

276 

282 

273 

261 

272 

291 

509 

243 

401 gtatacatgataaaagtgtatatgaagaattaaaagaactggtaaaaaat 



269 

279 

275 

276 

282 

273 

261 

272 

291 

509 

243 

451 ataaaagattataaacctataatattg tg — tggtga — 



gg — tgctga — 

eg — cccgaa — 

gg — tgeega — 

ag — tgec-a — 

aa — gtcag 

aggatgttga — 

-agtggctcgctacgg — tgatgc — 

-tg gt — tgaggaga 

-tcct ga — agcaga — 

-at cc — tgaagc — 

-ta ag — ctttta — 



-tg- 
-cg- 
-tg- 
-gg- 
-cg- 
-tg- 
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279 ctg- 
289 gtg- 



285 
285 
291 
285 
282 
284 
303 



-gacg atgg- 

-gacatggtaatgg- 

-gaca atgg- 

-ggtg ctga- 

-tgtc atgt- 

-ggtg atgg- 



-c- 
-c- 
-c- 
-a- 
-g- 
-c- 



-tgcg— 
-ggcc — 
-agcc — 
-cggc — 
-cgcc — 
-agcc — 



-ate gtegge 

-ate gtegge 

-att ateggt 

-ate aceggt 

-ate gtcggg 

-att gttggc 

-att gtcggt 



ttg— 
etc — 
egg— 
tea — 

cga gatt gtggtcac tggc — 

ctgaggctgacg ttgt cctcaatgcg ctg gtcggg 

tat ggta atgg c tgcg att gtgggg 

519 tgt aacc gttg 1 taceggaata gtaggt 

253 etc aa tgee att gtaggt 

48 6 tga aggg atga a agaa atatgtagtagta 

304 agcgcagggctcaagccggtgatgg 

319 gccgccgggctgccgtcgaccctgg 

310 tgcgccggtctaaaagcgacgcttg 

310 tcgatcggcctggccccgacgctgg 

316 gcggtggggctgccttccgcgctcg 

310 gctgctgggctgttacctacgcttg 

310 tgcgctggtctgctacccacgatcg 

319 gcattgggtctgcgacccacactgg 

328 gcggcgggattattgcctactttgt 

547 tgtgcgggactaaagcctacggttg 

271 tttgcaggacttaaaagcactttaa 

515 atagtatagataaaatagttattggtattgattcttttcaaggattatat 



329 -ccgcgctggaggccggtggcacc- 
344 -cggccgtcgaggccggcaagcgc- 
335 -cagctattcgcaagggcaaaacg- 
335 -ccgcgctgcgggccggccgggtg- 
3.41 -cagcggcgcaaaaaggcaaaacc- 
335 -ctgcgatccgcgcgggtaaaacc- 
335 -ccgcgatcgaagccggcaaggat- 
344 -ctgcactgcacacgggcgcgcga- 
353 -ctgcggtgaaagctggaaaacgt- 
572 -ctgcaattgaagcaggaaaggac- 
2 96 -aggctaaagagcttggcaaaaac- 



-gtcgcgctcgcgaacaa 
-gtactgctggccaacaa 
-gtcgctttagcgaataa 
-ctggtgctggcgaacaa 
-atttatctggcgaacaa 
-attttgctggccaataa 
-atcgcccttgccaacaa 
-ttggcgttggccaacaa 
-gtactattagcaaataa 
-attgctcttgcaaacaa 
-atagctttagctaacaa 



565 tctactatgtatgcaattatgaataataaaatagttgcgttagctaataa 
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Stdxrcds 369 ggagtcgctcgtctcggcgggtgaggtgatgatgg-cggcggcccgc-gc 

Padxrd 38 4 ggaggcgctggtgatgtccggcgcgctgttcatgc-aggcggt-caa-gc 

Zmdxrd 375 ggaatccttagtttcagctggcggattgatgatcg-atgccgtgcgg-ga 

Sgdxrd 37 5 ggagtcgctgatcgtcggcggtccgctggtgaagg-cggtg gc 

Nmdxrd 381 agagacgctggtggtttccggcgcgttgtttatgg-aaaccgcccgt-gc 

Ecdxrd 375 agaatcactggttacctgcggacgtctgtttatggacgccgtaaagcaga 

Sldxrd 375 agaaaccctgattgcagcaggcccagtggtcctgc-cactcctgcaa-aa 

Mldxrd 384 ggaatcgctggtagctggcggttcgctggtgttgg-ccgcggcgc a 

Pmdxrp 393 agaagccttggtaacttgcgggcaattatttattg-atgcagtgcgt-ga 

Atdxrd 612 agagacattaatcgcaggtggtcctttcgtgcttc-cgcttgccaac-aa 

Cjdxrd 336 agaaagtcttgtagtagctgg-gagtttttt 

Pfdxrd 615 agaatccattgtctctgctggtttctttttaaaga-aattattaaat-at 

Stdxrcds 417 gcat-ggc gcgacgctgctgccggtcgattcggagcacaatgcggtg 

Padxrd 4 31 gcagcggc gcggtgctcctgccgatcgacagcgagcacaacgcgatc 

Zmdxrd 423 acat-ggc acgacgcttctccccgtcgattccgagcataacgctatt 

Sgdxrd 417 gcag-ccc ggccagatcgtgccggtggactccgagcacgccgcgctg 

Nmdxrd 429 aaac-ggc gcggcagtgctgcccgtcgacagcgaacacaacgccgtt 

Ecdxrd 425 gcaa-agc gcaat — tgttaccggtcgatagcgaacataacgccatt 

Sldxrd 423 gcac-ggt gtcaccattacgcctgccgactccgagcactccgcgatc 

Mldxrd 429 gcca-ggc caga tcgtgcccgtagactcggaacactccgcgctg 

Pmdxrp 441 atct-caa gcacaattgttaccagtagatagtgaacataatgcgatt 

Atdxrd 660 acat-aat gtaaagattcttccggcagattcagaacattctgccata 

Cjdxrd 3 66 gaaa-ggg gctaaatttttacccgttgatagtgagc atgcagct 

Pfdxrd 663 tcat-aaaaatgcaaagataatacctgttgattcagaacatagtgctata 



Stdxrcds 463 ttccag 1 gc ct eg — at 

Padxrd 478 ttccag 1 eg ctgccgcgcaattatgccg — at 

Zmdxrd 4 69 ttccaa 1 gc tt c c 

Sgdxrd 4 63 ttccag g eg ct gg— cc 

Nmdxrd 47 5 ttccaagtttt gc eg cgegat 

Ecdxrd 469 tttcag a g 1 tt — ac 

Sldxrd 469 tttcag 1 gc at cc — aa 

Mldxrd 472 gcgcaa 1 gc ctgcg eg — gt 

Pmdxrp 487 ttccaa tcccttccgc ct ga — ag 

Atdxrd 706 tttcag 1 gt at 1 

Cjdxrd 409 ttaaaa 1 ttttact eg-- aa 

Pfdxrd 712 tttcaa 1 gt tt ag-ataata 
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Stdxrcds 478 eg ca cege 

Padxrd 508 gg cc tgga 

Zmdxrd 4 82 eg catcataa cege 

Sgdxrd 47 8 gg eg gege 

Nmdxrd 496 ta ca cagg 

Ecdxrd 482 eg ca acct 

Sldxrd 484 gggctttca acccatg 

Mldxrd 490 gg ta cc 

Pmdxrp 509 eg caaagacaaattgggttttgcccgc 

Atdxrd 718 ca ag gttt 

Cjdxrd 427 gg ta aaaa 

Pfdxrd 731 ataaggtattaaaaaca aa atgt- — 



Stdxrcds 486 geccagg ggcg tccgccg ga 

Padxrd 516 gcgggtc ggcg— tgcgccg ga 

Zmdxrd 4 96 gacta tg ttcgccg ga 

Sgdxrd 486 ccgcgcg gagg teegcaa gc 

Nmdxrd 504 tegectg aacg aacaegg ca 

Ecdxrd 490 atccagcataatct-ggga tacgctgaccttga 

Sldxrd 500 ctgattttcggcctgctcaagtcgtggcagggc tgcgacg ga 

Mldxrd 496 cccgac gaag ttgctaa gt 

Pmdxrp 536 tttctgaatta ggga tcagtaa ga 

Atdxrd 726 gectgaa ggcgctctgcgcaa ga 

Cjdxrd 435 aaatata gcaa aacttta ta 

Pfdxrd 754 ttacaag acaa tttttct aa 



Stdxrcds 506 tc a tccttacc 

Padxrd 536 tc c tcttgacc 

Zmdxrd 512 tt a ttattacg 

Sgdxrd 506 tg g tggtgacc 

Nmdxrd 524 tegcttcgatt a tcctgacc 

Ecdxrd 522 gc aaaa t ggcg tggtgtccattttacttacc 

Sldxrd 542 tt c tcctgact 

Mldxrd 515 ta g tgetaace 

Pmdxrp 560 tt g tgttaacg 

Atdxrd 7 49 ta a tcttgact 

Cjdxrd 455 tc aca 

Pfdxrd 774 aattaacaatataaataaaata tttttatg 
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Stdxrcds 517 gccagc-ggtggtccgttccgcg — eg acgccgaaggaagcgatgc 

Padxrd 547 gcctcc-ggcggcccgttccgcg — ag acgccgctgga-gcaactc 

Zmdxrd 523 gccagc-ggaggtcccttcagaa — ca acgtctcttgccgaaatg- 

Sgdxrd 517 gccagc-ggcggcccgttccgcaaccg cacccgtgagcagc — tgg 

Nmdxrd 54 4 gcttcc-ggcggcccgtttctga — c cgccgatttaaac-acgt 

Ecdxrd 553 gggtct-ggtggccctttccgtg — ag aegee — attgegegattt 

Sldxrd 553 gccagt-ggcggcgcttttcggg — ac tggccggtcgaacggctgt 

Mldxrd 526 gcctcc-ggcgggccgtttcgtg — gctggaacgccg-gcgacttggagc 

Pmdxrp 571 ggatcc-ggtggtccattccgtt — at acccctctgga-gcaattt 

Atdxrd 7 60 gcat ct-ggtggagcttttaggg — at tggcctgtcgaaaagctaa 

Cjdxrd 460 gcaagt-ggtggagctttttata — gg tataaaatcaaagatttaa 

Pfdxrd 804 ttcatctggaggtccatttcaaa — at ttaactatggacgaattaa 

Stdxrcds 560 gcg-ac — a- tea cccccgcacaggcggtggcg-catcccaactggt 

Padxrd 58 9 gct-tc — ggtga cgccggagcaggcttgtgcg-cacccgaactggt 

Zmdxrd 565 gca-ac — ggtca cgccagaacgcgcggttcag-catcccaactggt 

Sgdxrd 560 egg-cc — g-tca cgccggccgacgcgctggcg-cacccgacctggg 

Nmdxrd 584 tcg-ac — a-gcattacgcccgaccaagcggtcaaa-caccccaattggc 

Ecdxrd 594 ggc-aacaa-tga cgccggatcaagc-ctgccgtcatccgaactggt 

Sldxrd 596 cgc-aa — g-taa ctgtcgcagatgcgctcaag-catcccaactggt 

Mldxrd 57 2 gcg tta cacccgagcaggcgggcgtc-catccgacttggt 

Pmdxrp 613 gaacag — a-tca ccccagcacaagcagttgcg-catcctaattggt 

Atdxrd 803 agg-aa — g-tta aagtagcggatgcgttgaag-catccaaactgga 

Cjdxrd 503 atc-aa — g-tca gtgtcaaagatgctttaaaa-catcctaattgga 

Pfdxrd 848 aaa-at — g-taa catcagaaaatgctttaaag-catcctaaatgga 

Stdxrcds 602 cgatgggcgccaagatctcggtcgactccgcgacgatgatgaacaagggg 

Padxrd 632 cgatggggcgtaagatttccgtcgactccgccagcatgatgaacaagggg 

Zmdxrd 608 caatgggtgccaagatttctatcgattctgctacaatgatgaataagggg 

Sgdxrd 602 cgatgggcccggtggtgacgatcaactcggcgaccctggtgaacaagggc 

Nmdxrd 629 gtatgggacgcaaaatctccgtcgattccgccaccatgatgaacaaaggt 

Ecdxrd 638 cgatggggcgtaaaatttctgtcgattcggctaccatgatgaacaaaggt 

Sldxrd 638 cgatggggcgcaagattaccgtcgactccgccaccttgatgaataaaggc 

Mldxrd 611 caatggggacgatgaacacgctgaactcagcgtctctggttaacaagggg 

Pmdxrp 656 caatggggaaaaagatctctgtcgattccgctaccatgatgaataaaggg 

Atdxrd 845 acatgggaaagaaaatcactgtggactctgctacgcttttcaacaagggt 

Cjdxrd 545 acatgggagcaaagatcactatagatagtgcgactatggcaaataagctt 

Pfdxrd 890 aaatgggtaagaaaataactatagattctgcaactatgatgaataaaggt 
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Stdxrcds 652 ctcgaactgatcgaagccttccacctgttcccggtcgcc — gccgagcaa 

Padxrd 682 ctcgaactgatcgaggcgtgctggctgttc gacgcccagccgagcca 

Zmdxrd 658 cttgaattgatagaagcctatcatctcttccagattcca — ttagaaaaa 

Sgdxrd 652 ctggaggtgatcgaggcgcacctgctgtacgacgtgccg — ttcgaccgg 

Nmdxrd 67 9 ttggagctgattgaagcgcattggctgttcaactgtccg — cccgacaaa 

Ecdxrd 68 8 ctggaatacattgaagcgcgttggctgtttaacgccagc — gccagccag 

Sldxrd 68 8 ctcgaggtgatcgaagcccactatctcttcggcttggat — tacgactac 

Mldxrd 661 ctcgagctcatcgaagccaacctgttgttcggcattccc — tacgaccgc 

Pmdxrp 706 ttggaatatattgaagcacgctggttatttaatgcctcg — gcagaagaa 

Atdxrd 8 95 cttgaggtcattgaagcgcattatttgtttggagctgag — tatgacgat 

Cjdxrd 595 tttgagattatagaggcttatcatttat atgat — tttaaagaa 

Pfdxrd 940 ttagaggttatagaaacccattttttatttgatgtagat — tataatgat 

Stdxrcds 700 c-tggccgtgctggtccatcgccaatccgtcgtccattcgatggtggaat 

Padxrd 729 ggtcgaggtggtgatccacccgcagagcgtgatccactcgatggtggact 

Zmdxrd 706 t-ttgaaattttggttcatcctcagtcagttattcactccatggtggaat 

Sgdxrd 7 00. a-tcgaggtggtggtccatccgcagtcggtcgttcattcgatggtggaat 

Nmdxrd 727 c-tcgaagtcgtcatccatccgcaatctgtgatacacagcatggtgcgct 

Ecdxrd 736 a-tggaagtgctgattcacccgcagtcagtgattcactcaatggtgcgct 

Sldxrd 736 a-tcgacatcgtcatccatccccagagcatcatccactcgctgattgagc 

Mldxrd 709 a-ttgaggtggttgtgcaccctcagtcaattgttcattcgatggtgacat 

Pmdxrp 754 a-tggaagttattattcatcctcaatccattattcattctatggtacgtt 

Atdxrd 943 a-tagagattgtcattcatccgcaaagtatcatacattccatgattgaaa 

Cjdxrd 637 a-ttgatgctttaatagaaccaagatctttagtgcatgcaatgtgtgaat 

Pfdxrd 988 a-tagaagttatagtacataaagaatgcattatacattcttgtgttgaat 

Stdxrcds 749 atgtcgacggatcggtgctggcccagctcggcacgcccgacatgcgcacg 

Padxrd 779 acgtcgacggttcggtgatcgcccagctcggcaatccggacatgcgcacg 

Zmdxrd 755 atttggatggttctatccttgcccagatcggtagtcctgatatgagaaca 

Sgdxrd 7 49 tcgtggacggttcgacgatggcccaggccagcccgccggacatgcgcatg 

Nmdxrd 776 accgcgacggctccgtgttggcgcaactgggcaatcccgatatgcgaacg 

Ecdxrd 785 atcaggacggcagtgttctggcgcagctgggggaaccggatatgcgtacg 

Sldxrd 7 85 tagaagatacctccgtcttggcgcaattgggctggccggatatgcgactg 

Mldxrd 758 tcatcgacggctcgacgatcgcccaagccagccctccggacatgaagcta 

Pmdxrp 803 acatcgatgggtccgtgattgctcaaatggggaatcctgatatgcgtaca 

Atdxrd 992 cacaggattcatctgtgcttgctcaattgggttggcctgatatgcgttta 

Cjdxrd 686 ttaaaaatggagctagcacggcgtatttttcaaaagcagatatgaaacta 

Pfdxrd 1037 ttatagacaaatcagtaataagtcaaatgtattatccagatatgcaaata 
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Stdxrcds 799 ccgatcgcctatgcgctggcttggcccgagcgga 1 g 

Padxrd 829 ccgatttcctatgccatggcctggccggagcgaa 1 c 

Zmdxrd 805 ccgatcggtcatactttggcttggccaaagcgga 1 g 

Sgdxrd 799 ccgatcgcgctgggcctcggctggccggaccggg 1 g 

Nmdxrd 826 cctatcgcttattgtttgggtttgcccgagcgca 1 c 

Ecdxrd 835 ccaattgcccacaccatggcatggccgaatcgcg 1 g 

Sldxrd 835 cccttgctctacgccctctcctggcccgatcgcc 1 c 

Mldxrd 808 cctatttctttggcgttgggctggccacagcggg 1 g 

Pmdxrp 853 ccgattgcggaaaccatggcatatccaagtcggaccgtt g 

Atdxrd 1042 ccgattctctacaccatgtcatggcccgatagag ttccttgttctg 

Cjdxrd 736 gctatttcagatgctatattt gaaaaac a a 

Pfdxrd 1087 cccatattatattctttaacatggcctgatagaa 1 a 

Stdxrcds 835 gag-acgc tgtgccc gccgc-t-cgaccttg ccac 

Padxrd 865 gat-tccg gcgtttc gccgc-t-ggatatgt — tcgc 

Zmdxrd 841 gaa-acac cagccga atcgt-t-ggatttta ccaa 

Sgdxrd 835 ccggacgc cgccc ccggc-tgcgactgga ccaa 

Nmdxrd 862 gat- 1 egg gtgtcg gcgacct-ggatttcg aege 

Ecdxrd 871 aa c tctggcgtgaagccgc-t-cgattttt gcaa 

Sldxrd 871 tct-actc aatggtc ggcgc-t-cgatctgg tcaa 

Mldxrd 844 g gtg-gc gctgc-t-cgagcctgtgctttcactac 

Pmdxrp 893 ctg-gcgt tgagece t-t-ggattttt acca 

Atdxrd 1088 aag-taac t-tggee aagac-t-tgaccttt gcaa 

Cjdxrd 7 66 gat-acgectattttaga ggctg-t-tgatttta gca- 

Pfdxrd 1123 aaa-acaa atttaaa acctt-t-agatttgg ctca 

Stdxrcds 867 ggtgggtaagctcgagttcgaaaatcccgat ctcgatcgct tc 

Padxrd 897 cgtcggtcgcctggatttccagcgccccgacgagcagcgcttc 

Zmdxrd 873 attgcgccagatggattttgaagcaccagattatgaacgtttt 

Sgdxrd 8 67 ggccgcgacctgggagttcttcccgctggacaacgaggcgtt c 

Nmdxrd 894 attgtccgcgctgaccttccaaaagcccgactttgaccgcttc 

Ecdxrd 903 actaagtgcgttgacatttgccgcaccggattatgatcgttat 

Sldxrd 903 agcgggcagcttggagttccgggaaccggatcacgccaaatac 

Mldxrd 87 6 cgcatctacctgggaattcgagccgctggacatcgatgttttt 

Pmdxrp 921 actgaatggattaacctttattgagccagactatcaacgttat 

Atdxrd 1119 actcggttcattgactttcaagaaaccagacaatgtgaaatac 



Pfdxrd 1155 ggtttcaactcttacatttcataaaccttctttagaacatttc 
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Stdxrcds 910 ccggcgctcgcgctggcgatggaggcattgaag-gcgggcggggcgc 

Padxrd 940 ccctgcctgcgcctggcgagccaggccgcggaa-accggcggcagcg 

Zmdxrd 916 ccggcattaactttggcaatggaatccatcaaa-tcaggtggggctc 

Sgdxrd 910 ccggcggtcgagctggcccgcgaggtgggtacg-ctcggcgggaccg 

Nmdxrd 937 ccctgcctgaagctcgcctatgaagccatgaac-gcaggcggagccg 

Ecdxrd 94 6 ccatgcctgaaactggcgatggaggcgttcgaa-caaggccaggcag 

Sldxrd 94 6 ccctgcatggacttggcctacgccgccggtcgc-aaaggcggcacaa 

Mldxrd 919 cccgcagtcgagctggcccggcacgctggacag-atcggcggctgta 

Pmdxrp 964 ccttgtttaaaattagctattgacgcattttca-gccggacaatatg 

Atdxrd 1162 ccatccatggatcttgcttatgctgc-tggacgagctggaggcacaa 

Cjdxrd 841 tatcctatttttaagcttaaaaatacatttttaaaa-gagccaaatttag 

Pfdxrd 1198 ccgtgtattaaattagcttatcaagcaggtata-aaaggaaactttt 

Stdxrcds 956 gtccggccattctcaatgccgccaacgaagtcgccgtcgcggcctttctc 

Padxrd 98 6 ccccggccatgctgaatgccgcgaacgaggtggccgtggccgcatttctc 

Zmdxrd 962 gtcctgctgtaatgaatgccgctaatgaaatagctgtggcggccttcctt 

Sgdxrd 956 ccccggcggtcttcaatgccgccaacgaggaatgtgtggacg-ctttcct 

Nmdxrd 983 cgccctgcgtattgaacgccgccaacgaagccgccgtcgccgcctttttg 

Ecdxrd 992 cgacgacagcattgaatgccgcaaacgaaatcaccgttgctgcttttctt 

Sldxrd 992 tgccagccgtcttgaatgcggcgaatgagcaagccgtcgccctcttccta 

Mldxrd 965 tgaccgccatttacgatgctgctaatgaggaggctgcagaggccttcctc 

Pmdxrp 1010 ccacgacagcaatgaatgcagcgaatgaaatcgcggtagcgtctttctta 

Atdxrd 1208 tgactggagttctcagcgccgccaatgagaaagctgttgaaatgttcatt 

Cjdxrd 890 gt gttatcatcaatgctgctaatgaagttggtgtttataatttttta 

Pfdxrd 1244 atccaactgtactaaatgcgtcaaatgaaatagctaacaacttatttttg 

Stdxrcds 1006 gccgggcggat c ggattccttgaaa-ttgccg 

Padxrd 1036 gagcggcacat c cgcttcagcgaca-tcgcgg 

Zmdxrd 1012 gataagaaaat c ggttttcttgata-tcgcta 

Sgdxrd 1005 gaagggcgcactgcccttcacc ggaatcgtggaca-ctgtgg 

Nmdxrd 1033 gacggacagat 1 aagtttaccgaca-ttgcca 

Ecdxrd 1042 gcgcaacaaat c cgctttacggata-tcgctg 

Sldxrd 1042 gaggagcaaat 1 cacttctcggata-ttccgc 

Mldxrd 1015 caaggtcggat c ggcttccccgcca-tcgtcg 

Pmdxrp 1060 gacaataagat 1 aaattcacagata-ttg 

Atdxrd 1258 gatgaaaagat aagctatttggatatcttcaaggttgtgg 

Cjdxrd 937 gaaaataaaag 1 ggatttttagaca-ttgcta 

Pfdxrd 1294 aataataaaat 1 aaatattttgata-tttcct 
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-aatctctgccg atacgctgtctcgctatgac ccgg — 

-tatcatcgagg acgtgctgaaccgcgaggcg gtga — 

-aattgtcgaga aaacattagatcattataca cccg — 

-gaaggtggtcgccgaacacggcacaccgcaat egg — 

-aaccgtcgccc attgtctttcac aagacttttcaga — 

-a ctggaaaaaatggat atgc — 



1037 c c 

1067 t 

1043 a c 

1046 c 

1064 a c 

1073 cgttgaatttatccgt- 

107 3 g cctgattgaac gtgcctgcgatcgccaccaa aegg — 

104 6 c aacaatcgegg atgtgttgcagcgtgccgac caat — 

108 8 cgegacta aatcagttagtcgtgagcaa attg — 

12 98 a attaacatgeg ataaac atcgaaacgag ttggta 

968 a atgeattttta aagcccttgatcattttgga gtac — 

1325 etat-aatategcaag ttcttgaatctttcaattct caaa — 



1073 -ccgcgcc g gaaaege tc g atg 

1103 -ccgcagt c gaatege tc g ate 

107 9 -caacccc g tcttctt tg g aag 

1082 -gaacttc g etcaegg tg g agg 

1101 -eggcata g gcgac-a ta g ggg 

1109 -gegaace a caatgtg tg g acg 

1109 -agtggcaacag caaccga gcttgg atg 

1082 -gggctcc c caatggg gt g agggac 

1120 -caaccac a aaaaattcattgeata g aag 

1333 acatcacc gtctcttgaagaga tt gttcactatg 

1004 -ctaaaat 1 tcaagca ta g aag 

1364 -aggtttc g gaaaata gt g aag 



1094 
1124 
1100 
1103 
1121 
1130 
1136 
1106 
1148 
1367 
1025 
1385 



ccg tgctggc g- 

agg tcctggctgccg- 

atg tetttge g- 

acg tac 

ggc tcttggc g- 



atcga cgc — gga 

atege cgc 

atcga caa — tga 

tcca cgc — gga 

caaga tgcccgga 

atg tgttatc 1 gttga tgc — gaa 

aca ttttggc c tacga cgc — ttg 

ccgctactgtggat g atgta etc ga 

atg tacttga g gtaga taa — aaa 

act tgtgggc a cgtgaatatgccgc — gaa 

aag tttttga g tatga 

att taatgaa gcaaattctacaaataca ttc — ttg 
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Stdxrcds 1116 g-gcgcggc — tttacgcggctgagcg-agtg 

Padxrd 1147 — gcgcg ttcggtcgccgggca-atgg 

Zradxrd 1122 a-gcgcgga — tacaagccgctgcttt-aatg 

Sgdxrd 1119 gagctgggc ccgggcccgggcc-cgcg 

Nmdxrd 114 5 c-acgcgca — caagcgcgg gca-ttta 

Ecdxrd 1152 c-gcgcg tgaagtcgccaga aaag 

Sldxrd 1158 g-gcacggcagtttgtgcaagctagct-atca 

Mldxrd 1131 c-gcgcagc— gctgggcccgtgagcg-agcgttgtgtgcggtagcaaca 

Pmdxrp 1170 g-gcaagggaattatctcagtcaatca-tttt 

Atdxrd 1395 t-gtgcagc— tttcttctg--gtgct-aggc 

Cjdxrd 1041 ttttaaaacaagagagtattt 

Pfdxrd 1419 g-gccaaag — ataaagctaccgatat-atac 

Stdxrcds 1144 aag gactgc—gtcg cttga 

Padxrd 1171 ttg acccgg— cacg ccggctag 

Zmdxrd 1150 gag agtttg—cccg cgtga 

Sgdxrd 1145 a gctggcggccg- gctga 

Nmdxrd 1169 teg gcacac— tgcg c-tga 

Ecdxrd 1175 agg tgatgc— gtct cgcaagctga 

Sldxrd 1188 aagtctggaatcc — gtcg tttag 

Mldxrd 1177 gcgagttctggaaag gtctct— gacatggtcttagaaaggtccta 

Pmdxrp 1200 aag tttttc— acat ccgtaa 

Atdxrd 1421 cag ttc— at-g catga 

Cjdxrd 1062 aag ga gttaa 

Pfdxrd 1447 aac aaacat — aatt cttcatag 
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1162 
1186 
1198 
1210 
1221 
1219 
1435 
1072 
1468 
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Stdxrp 5 vtvlgatgsvgtstldlie rnphaf ewalta 

Zmdxrp 7 vtvlgatgsighstldlie rnldryqvialta 

Padxrp 7 isvlgatgsiglstldwq rhpdryeaf altg 

Ecdxrp 4 ltilgstgsigcstldwr hnpehf rwalva 

Nmdxrp 6 ltilgstgsigestldws rhpekf rvf alag 

Hidxrp 6 ivilgstgsigkstlsvie nnpqkyhaf alvg 

Ssdxrp 5 isilgstgsigtqtldivt hhpdaf qwglaa 

Pmdxrp 10 ivilgstgsigtstlsvit hnpdkyqvf alvg 

Sldxrp 4 vtllgstgsigtqtldile qypdrf rlvglaa 

Sgdxrp 1 mvilgstgsigtqaidwl rnpgrf kwalsa 

Bsdxrp 4 icllgatgsigeqtldvlr ahqdqf qlvsmsf 

Mldxrp 15 vlvlgstgsigtqalevia anpdrf ewglaa 

Mtdxrp 23 evtnstdgradgrlrvwlgstgsigtqalqvia dnpdrf ewglaa 

Atdxrp 83 isivgstgsigtqtldiva enpdkfrwalaa 

Cjdxrp 1 milf gstgsigvnalklaa lk — nipisalac 

Pfdxrp 80 vaif gstgsigtnalniirecnkienvfnvkalyv 

Stdxrp 37 -n-cdveklaaaairtrarcawadekclpalqerla — g s g 

Zmdxrp 39 -n-rnvkdladaakrtnakraviadpslyndlkeala — g s s 

Padxrp 39 -f-srlaelealclrhrpvyavvpeqaaaialqgsla — a a g 

Ecdxrp 36 -g-knvtrmveqclef spryavmddeasakllktmlqqqg s r 

Nmdxrp 38 -h-kqveklaaqcqtfhpeyawadaehaarleallkrdg 1 a 

Hidxrp 38 -g-knveamf eqcikf rphf aalddvnaakilrekli — a h h 

Ssdxrp 37 -g-gnvallaqqvaef rpeivairqaekledlkaava — el 1 d 

Pmdxrp 42 -g-rnvelmfqqcltf qpsfaaldddvaakmlaeklk — ahq — s q 

Sldxrp 36 -g-rnvallseqirrhrpeivaiqdaaqlselqaaia — did — n p 

Sgdxrp 33 ag-gavellaeqavalgvhtvavad paaeeaaar-g p g 

Bsdxrp 36 -g-rnidkavpmievf qpkfvsvgdldtyhklkqmsf — s f e 

Mldxrp 47 -ggaqldtllrqraatgvtniaiaddra aqla — g dipyhg 

Mtdxrp 70 -ggahldtllrqraqtgvtniavadehaaqrvgd 

Atdxrp 115 -g-snvtlladqvrrf kpalvavrneslinelkeala — d 1 d 

Cjdxrp 31 -g-dniallneqiarf kpkfvsikdsknkhlvkhdrv — f i g 

Pfdxrp 115 -n-ksvnelyeqaref Ipeylcihdksvyeelkelvk — nikdyk p 
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Stdxrp 7 5 --v — ea — mg gahsvcdva-rm-g-adwtmaa-ivgsaglk 

Zmdxrp 77 — v — ea — aa gadalveaa-mm-g-adwtmaa-iigcaglk 

Padxrp 77 — i — rtrvlf geqalceva-sa-pevdmvmaa-ivgaaglp 

Ecdxrp 7 6 — t — ev — Is gqqaacdma-aled-vdqvmaa-ivgaagll 

Nmdxrp 7 8 — t — qv — lh gaqalvdva-sa-devsgvmca-ivgavglp 

Hidxrp 7 6 — iptev — la grraicelaahp-d-adqimas-ivgaagll 

Ssdxrp 7 6 — y — qp — myw geegweva-ry-gdaeswtg-ivgcagll 

Pmdxrp 82 — t — tv — la gqqaicelaahp-e-admvmaa-ivgaagll 

Sldxrp 7 6 — p — li — It geagvteva-ry-gdaeiwtg-ivgcagll 

Sgdxrp 69 — g — qg — agrplprvlagpdaatela-aa-e-chsvlng-itgsigla 

Bsdxrp 7 4 — c — qi — gl geeglieaa-vm-eevdiwna-llgsvgli 

Mldxrp 85 — t — da — vt rl ve-et-e-adwlna-lvgalglr 

Mtdxrp 103 — i — py — hg sdaatrlve-qt-e-adwlna-lvgalglr 

Atdxrp 153 ykl — ei — ip geqgvieva-rh-p-eavtwtgivgcaglk 

Cjdxrp 69 — q — eg — le qiltecqdk-11 lna-ivgfaglk 

Pfdxrp 157 — i — il — cgde gmkeic — s-sn-s-idkivig-idsf qgly 

Stdxrp 107 pvmaaleaggtvalankeslvsagevmmaaarah-gatllpvdsehnavf 

Zmdxrp 109 atlaairkgktvalankeslvsagglmidavreh-gttllpvdsehnaif 

Padxrp 112 stlaaveagkrvllankealvmsgalfmqavkrs-gavllpidsehnaif 

Ecdxrp 109 ptlaairagktillankeslvtcgrlfmdavkqs-kaqllpvdsehnaif 

Nmdxrp 111 salaaaqkgktiylanketlwsgalfmetaran-gaavlpvdsehnavf 

Hidxrp 111 ptlsavkagkrvllankeslvtcgqlf idavkny-gskllpvdsehnaif 

Ssdxrp 111 ptmaaiaagkdialanketliagapwlplvekm-gvkllpadsehsaif 

Pmdxrp 115 ptlsavkagkrvllankealvtcgqlf idavres-qaqllpvdsehnaif 

Sldxrp 109 ptiaaieagkdialanketliaagpwlpllqkh-gvtitpadsehsaif 

Sgdxrp 109 ptlaalragrvlvlankeslivggplvkavaqp gqivpvdsehaalf 

Bsdxrp 107 ptlkaieqkktialanketlvtaghivkehakky-dvpllpvdsehsaif 

Mldxrp 112 ptlaalhtgarlalankeslvaggslvlaaaqp gqivpvdsehsala 

Mtdxrp 135 ptlaalktgarlalankeslvaggslvlraarp gqivpvdsehsala 

Atdxrp 18 8 ptvaaieagkdialanketliaggpf vlplankh-nvkilpadsehsaif 

Cjdxrp 96 stlkakelgknialankeslwagsf 1 k-gakf lpvdsehaalk 

Pfdxrp 189 stmyaimnnkivalankesivsagf f lkkllnihknakiipvdsehsaif 
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156 qcldrtap 

158 qcfphhnr 

161 qslprnya 

158 qslpqpiq 

160 qvlprdytgrlne- 

160 qslppeaq 

160 qclqgvpe 

164 qslppeaq 

158 qciqglst 

156 qalaggar 

156 qalqgeqa .-- 

159 qclrggtp 

182 qclrggtp 

2 37 qciqglpe 

139 file— gk- 



239 qcldnnkvlktkclqdnf skin- 



-r g vrriiltasggp 

-d y vrriiltasggp 

-d glervgvrrilltasggp 

-hnlgyadleqng wsilltgsggp 

-h g iasiiltasggp 

-ekigf cplselg vskiiltgsggp 

-g g lrriiltasgga 

-rqigf cplselg iskivltgsggp 

-hadf rpaqwag lrriiltasgga 

-a e vrklwtasggp 

-k n ierliitasggs 

-d e vaklvltasggp 

-d e vaklvltasggp 

-g a Irkiiltasgga 

-k n iaklyitasgga 

--inkif lcssggp 



-n- 



17 8 fratpkeamrditpaqavahpnwsmgakisvdsatmmnkglelieafhlf 

180 frttslaematvtperavqhpnwsmgakisidsatmmnkglelieayhlf 
188 f retpleqlasvtpeqacahpnwsmgrkisvdsasmmnkglelieacwlf 
190 fretplrdlatmtpdqacrhpnwsmgrkisvdsatmmnkgleyiearwlf 
187 f ltadlntfdsitpdqavkhpnwrmgrkisvdsatmmnkglelieahwlf 
192 frytpleqftnitpeqavahpnwsmgkkisvdsatmmnkgleyiearwlf 
182 frdlpverlpfvtvqdalkhpnwsmgqkitidsatlmnkglevieahylf 
196 frytpleqfeqitpaqavahpnwsmgkkisvdsatmmnkgleyiearwlf 
190 frdwpverlsqvtvadalkhpnwsmgrkitvdsatlmnkglevieahylf 
17 8 f rnrtreqlaavtpadalahptwamgpwt insat lvnkglevieahlly 
178 frdktreelesvtvedalkhpnwsmgakitidsatmmnkglevieahwlf 

181 frgwnagdlervtpeqagvhptwsmgtmntlnsaslvnkglelieanllf 
204 frgwsaadlehvtpeqagahptwsmgpmntlnsaslvnkgleviethllf 
259 frdwpveklkevkvadalkhpnwnmgkkitvdsatlfnkglevieahylf 
159 fyrykikdlnqvsvkdalkhpnwnmgakitidsatmanklfeiieayhly 
274 fqnltmdelknvtsenalkhpkwkmgkkitidsatmmnkgleviethf If 
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Stdxrp 228 pvaaeqlavlvhrqswhsmveyvdgsvlaqlgtpdmrtpiayalawper 

Zmdxrp 230 qiplekf eilvhpqsvihsmveyldgsilaqigspdmrtpightlawpkr 

Padxrp 238 daqpsqvewihpqsvihsmvdyvdgsviaqlgnpdrartpisyamawper 

Ecdxrp 240 nasasqmevlihpqsvihsmvryqdgsvlaqlgepdmrtpiahtmawpnr 

Nmdxrp 237 ncppdklewihpqsvihsmvryrdgsvlaqlgnpdmrtpiayclglper 

Hidxrp 242 nasaeemeviihpqsiihsmvryvdgsvitqmgnpdmrtpiaetmayphr 

Ssdxrp 232 gldydhidivihpqsiihslievqdtsvlaqlgwpdmrlpllyalswper 

Pmdxrp 246 nasaeemeviihpqsiihsmvryidgsviaqmgnpdmrtpiaetmaypsr 

Sldxrp 240 gldydyidivihpqsiihslieledtsvlaqlgwpdmrlpllyalswpdr 

Sgdxrp 228 dvpf drievwhpqswhsmvefvdgstmaqasppdrtirmpialglgwpdr 

Bsdxrp 228 dipyeqidwlhkesiihsmvef hdksviaqlgtpdmrvpiqyaltypdr 

Mldxrp 231 gipydrievwhpqsivhsmvtfidgstiaqasppdmklpislalgwpqr 

Mtdxrp 254 gipydridwvhpqsiihsmvtf idgstiaqasppdmklpislalgwprr 

Atdxrp 309 gaeyddieivihpqsiihsmietqdssvlaqlgwpdmrlpilytmswpdr 

Cjdxrp 209 df — keidalieprslvhamcef kngastayf skadmklaisdaif — ek 

Pfdxrp 324 dvdyndievivhkeciihscvef idksvisqmyypdmqipilysltwpdr 

Stdxrp 278 m et-l-cppldlatvgklef enpdldrfpalalamealkaggarpai 

Zmdxrp 2 80 m et-p-aesldf tklrqmdf eapdyerfpaltlamesiksggarpav 

Padxrp 288 i ds-g-vspldmf avgrldf qrpdeqrfpclrlasqaaetggsapam 

Ecdxrp 2 90 v ns-g-vkpldf cklsaltf aapdydrypclklameaf eqgqaatta 

Nmdxrp 287 i ds-g-vgdldf dalsaltf qkpdfdrf pclklayeamnaggaapcv 

Hidxrp 2 92 t f a-g-vepldf f kikeltf iepdf nrypnlklaidaf aagqyatta 

Ssdxrp 282 i yt-d-wepldlvkagslsf repdhdkypcmqlaygagraggampav 

Pmdxrp 296 t va-g-vepldf yqlngltf iepdyqrypclklaidaf sagqyatta 

Sldxrp 290 1 st-q-wsaldlvkagslef repdhakypcmdlayaagrkggtmpav 

Sgdxrp 278 v pd-a-apgcdwtkaatwef fpldneafpavelarevgtlggtapav 

Bsdxrp 278 1 pl-pdakrlelweigslhf ekadfdrf rclqf af esgkiggtmpty 

Mldxrp 281 v gg-a-aracaf ttastwef epldidvfpavelarhagqiggcmtai 

Mtdxrp 304 v sg-a-aaacdf htasswef epldtdvfpavelarqagvaggcmtav 

Atdxrp 359 vpcsev-t-wprldlcklgsltf kkpdnvkypsmdlayaagraggtmtgv 

Cjdxrp 255 q dtpi-leavdf skmpalkf hpistkkypif klkntf lkepnl-gvi 

Pfdxrp 374 i kt-n-lkpldlaqvstltf hkpslehfpciklayqagikgnf yptv 
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Stdxrp 323 lnaanevavaaf lagrigf leiaaisadtlsry d pa-a — pe — 

Zmdxrp 325 mnaaneiavaaf ldkkigf Idiakivektldhy 1 pa-t — ps — 

Padxrp 333 lnaanevavaaf lerhirf sdiaviiedvlnre a vt-a — ve — 

Ecdxrp 335 lnaaneitvaaf laqqirf tdiaalnlsvlekm d mr-e — pq — 

Nmdxrp 332 lnaaneaavaaf ldgqikf tdiaktvahclsqd f sd-g — ig — 

Hidxrp 337 mnaaneiavqaf ldrqigfmdiakinsktieri s py-t — iq — 

Ssdxrp 327 Inaaneqavalf lqekisf ldiprliektcdlyvgqn ta-s — pd — 

Pmdxrp 341 mnaaneiavasf Idnkikf tdiarlnqlwskl q pq-k — in — 

Sldxrp 335 Inaaneqavalf leeqihf sdiprlieracdrh q te-w — qqqp 

Sgdxrp 323 f naaneecvdaf Ikgalpftgivdtvakwaeh gt — pq-s — gt — 

Bsdxrp 32 4 lnaanevavaaf lagkipf laiedciekaltrh qllkkp-s — wr — 

Mldxrp 32 6 ydaaneeaaeaf Iqgrigf paivat iadvlqra d qw-a — pq — 

Mtdxrp 34 9 ynaaneeaaaaf lagrigf paivgiiadvlhaa d qw-avepa — 

Atdxrp 407 Isaanekavemfidekisyldifkweltcdkhrn-e lv-t — sp — 

Cjdxrp 300 inaanevgvynf lenksgf ldiakcif kaldhf g vp-k — is — 

Pfdxrp 419 Inasneiannlf lnnkikyf dissiisqvlesf n sqkv — se — 

Stdxrp 362 tldavlaid — aearlyaaervkdcva 

Zmdxrp 364 sledvfaid — neariqaaalmeslpa 

Padxrp 372 sldqvlaad — rrarsvagqwltrhag 

Ecdxrp 374 cvddvlsvd — anarevarkevmrlas 

Nmdxrp 371 diggllaqd — artraqaraf igtlr 

Hidxrp 37 6 niddvleid — aqareiaktllre 

Ssdxrp 369 -letilaad — qwarrtvlen-sacvatrp 

Pmdxrp 38 0 ciedvlevd — kkarelsqsiilsf shp 

Sldxrp 37 6 slddilayd — awarqfvqasyqslesw 

Sgdxrp 363 sltvedvlh — aes — warararelaag 

Bsdxrp 366 tfkkwtkip — gdtsiqyshkw-cs 

Mldxrp 365 wgegpatvddvldaqrwareralcavatassgkvsdmvlers 

Mtdxrp 390 tvddvl daqrwareraqravsgmasvaiastakpgaagrhastl 

Atdxrp 448 sleeivhyd — lwareyaanvqlssgarpvha 

Cjdxrp 339 sieevfeyd — fktreylrs 

Pfdxrp 459 nsedlmkqi — lqihswakdkatdiynkhn 
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